feat(core): chDB connection pool, Arrow WAL zero-copy flush, and InfluxQL parity by austin-barrington · Pull Request #52 · hyperbyte-cloud/hyperbytedb

austin-barrington · 2026-06-23T19:25:59Z

Re-architect the write path around chDB-ready Arrow batches, enable real
same-path connection pooling for concurrent flush inserts and queries, and
close major gaps in InfluxDB v1 TimeseriesQL semantics (DDL, SHOW, CQ
scheduling, rollups/MVs, SELECT INTO). Bump workspace to 0.8.3.

This is a cross-cutting release: ingest, WAL, flush, chDB adapter, query
translation, cluster schema apply, and observability all move together so
the fast path is correct end-to-end rather than bolted on in one layer.

Add ChdbConnectionPool: N independent Connections to the same
--path, each with its own ChdbClient mutex (libchdb process-global
singleton per path — see chdb/insert/concurrency.rs).
Round-robin checkout with try_lock on busy slots; clamp pool_size
to 1..=32 (default 4).
Rewire ChdbNativeAdapter, ChdbQueryAdapter, and ChdbSession to
use the pool instead of a single shared session.
Fix config/docs: chdb.pool_size > 1 is now real parallelism, not a
deprecated no-op. Recommend server.max_concurrent_queries >= pool_size.
Update system-architecture.md to describe same-path multi-connection
semantics (replacing the old per-slot subdirectory model).
Build chDB-ready fact-table RecordBatches at ingest time
(application/arrow_ingest/ for line protocol, columnar, msgpack, points).
Introduce PreparedWalSlot / PreparedMeasurementBatch domain types with
post-assign ingest_seq patching (domain/prepared_wal.rs).
Coalesce sparse prepared batches per measurement (domain/arrow_coalesce).
In-memory WalArrowCache indexes unflushed prepared slots by sequence;
bounded take_range(from, to_inclusive) avoids evicting post-snapshot
entries (fixes steady ~50% cache-miss under continuous load).
Versioned on-disk WAL encoding via wal_ipc (HBWA magic, v1): optional
storage.wal_format = "arrow_ipc" alongside legacy bincode.
FlushService flushes prepared chunks directly through
insert_record_batch_direct — no re-parse/re-coalesce on the hot path.
flush.arrow_wal_enabled (default true) gates the RAM cache; metrics
gauge hyperbytedb_wal_arrow_cache_entries for growth/OOM watch.
application/wal_append.rs bundles prepared slots with legacy WalEntry
for peer sync compatibility.
build_prepared_wal_slot, write_prepared_batch, schema cache refresh
from metadata, and field-type widening reconciliation
(ALTER TABLE ... MODIFY COLUMN when metadata union exceeds cached types).
Engine DDL: raw facts use ReplacingMergeTree(ingest_seq); rollup/MV
destinations with additive partials use SummingMergeTree on sum columns.
Pad sparse legacy WAL batches to full ensured column sets before insert.
New HTTP /internal/chdb adapter hook for admin/debug paths.
Depend on chdb-rust feat_arrow_insert (Arrow C Data Interface insert);
Docker builds clone that branch; root/proxy Dockerfiles stub all workspace
crates for layer caching.
Split parser: dedicated lexer.rs + ddl_parser.rs for token-driven
InfluxQL DDL/SHOW/auth (CREATE/DROP/ALTER DB/RP/user, GRANT/REVOKE,
SHOW DATABASES/MEASUREMENTS/TAG KEYS/TAG VALUES/FIELD KEYS/SERIES/CQs/MVs).
Major to_clickhouse.rs expansion:
- Raw selects always project time, ascending order.
- GROUP BY time defaults, fill/null/with bounds, tag ordering.
- Materialized view backfill column ordering and dest insert mapping.
- Rollup fact views (build_coalesced_fact_view_*): sum for additive
  fields, mean → sum/count rewrite on rollup measurements.
- CQ bounded SELECT INTO translation and time-window predicate stripping.
predicate_sql: shared WHERE → SQL for DELETE / DROP SERIES (local +
replication).
field_type domain module; rollup combine semantics for MV/CQ fields.
InfluxDB v1 CQ scheduling (domain/cq_schedule.rs): bucket alignment,
RESAMPLE EVERY/FOR validation, coverage windows, boundary-aligned
should_run, execution interval derivation.
QueryService::execute_continuous_query; reconstruct CQ text for replay.
MaterializedViewService and ContinuousQueryService wired to new
schedule metadata and bounded backfill paths.
Peer/cluster: PeerQueryService Raft mutation forwarding, leader addr
resolution (forward node → Raft → cluster membership → metrics leader),
MV source/dest retention policy resolution.
schema_mutation_apply: single apply path for Raft state machine,
/internal/replicate-mutation, and startup metadata sync (metadata +
chDB DDL side effects).
RocksDB metadata adapter extended for CQ schedule fields, rollups, and
richer measurement meta.
Replication apply, hinted handoff, drain, bootstrap, and Raft log/state
machine updated for prepared WAL and schema mutations.
Expanded SHOW/DDL execution, SELECT INTO, retention policy normalization,
tag key/value discovery from series tables, authorization checks.
CLI 0.8.3: admin/query/export/import/repl hooks for new statement types;
e2e test coverage extended.
tikv-jemallocator with background purging: return transient startup
heap (series dedup warm + WAL replay) to the OS instead of pinning RSS.
Default retention sweep interval 12h (was 60s).
Grafana dashboards refreshed (cluster, logging, machine-monitoring);
Kind CR manifest and docker-compose aligned with new config knobs.
scripts/load.sh updated for pool/Arrow WAL load testing.
New compat suites: combination_tests (full parse→translate→execute
interaction tests), cq_tests, prepared_wal_tests.
Expanded ddl_tests, query_tests, metadata_tests, http_tests.
Integration/raft/sync_quorum tests updated for prepared WAL and pooling.
Bench stubs adjusted for new ingest signatures.

BREAKING CHANGE: chDB session pooling semantics changed — pool_size now
opens multiple same-path connections (real concurrency) instead of being
ignored/warned. Tune pool_size and max_concurrent_queries together.
New config keys: storage.wal_format, flush.arrow_wal_enabled.
Default retention interval is now 12h.

…uxQL parity Re-architect the write path around chDB-ready Arrow batches, enable real same-path connection pooling for concurrent flush inserts and queries, and close major gaps in InfluxDB v1 TimeseriesQL semantics (DDL, SHOW, CQ scheduling, rollups/MVs, SELECT INTO). Bump workspace to 0.8.3. This is a cross-cutting release: ingest, WAL, flush, chDB adapter, query translation, cluster schema apply, and observability all move together so the fast path is correct end-to-end rather than bolted on in one layer. - Add `ChdbConnectionPool`: N independent `Connection`s to the same `--path`, each with its own `ChdbClient` mutex (libchdb process-global singleton per path — see chdb/insert/concurrency.rs). - Round-robin checkout with `try_lock` on busy slots; clamp `pool_size` to 1..=32 (default 4). - Rewire `ChdbNativeAdapter`, `ChdbQueryAdapter`, and `ChdbSession` to use the pool instead of a single shared session. - Fix config/docs: `chdb.pool_size > 1` is now real parallelism, not a deprecated no-op. Recommend `server.max_concurrent_queries >= pool_size`. - Update system-architecture.md to describe same-path multi-connection semantics (replacing the old per-slot subdirectory model). - Build chDB-ready fact-table `RecordBatch`es at ingest time (`application/arrow_ingest/` for line protocol, columnar, msgpack, points). - Introduce `PreparedWalSlot` / `PreparedMeasurementBatch` domain types with post-assign `ingest_seq` patching (`domain/prepared_wal.rs`). - Coalesce sparse prepared batches per measurement (`domain/arrow_coalesce`). - In-memory `WalArrowCache` indexes unflushed prepared slots by sequence; bounded `take_range(from, to_inclusive)` avoids evicting post-snapshot entries (fixes steady ~50% cache-miss under continuous load). - Versioned on-disk WAL encoding via `wal_ipc` (`HBWA` magic, v1): optional `storage.wal_format = "arrow_ipc"` alongside legacy bincode. - `FlushService` flushes prepared chunks directly through `insert_record_batch_direct` — no re-parse/re-coalesce on the hot path. - `flush.arrow_wal_enabled` (default true) gates the RAM cache; metrics gauge `hyperbytedb_wal_arrow_cache_entries` for growth/OOM watch. - `application/wal_append.rs` bundles prepared slots with legacy WalEntry for peer sync compatibility. - `build_prepared_wal_slot`, `write_prepared_batch`, schema cache refresh from metadata, and field-type widening reconciliation (`ALTER TABLE ... MODIFY COLUMN` when metadata union exceeds cached types). - Engine DDL: raw facts use `ReplacingMergeTree(ingest_seq)`; rollup/MV destinations with additive partials use `SummingMergeTree` on sum columns. - Pad sparse legacy WAL batches to full ensured column sets before insert. - New HTTP `/internal/chdb` adapter hook for admin/debug paths. - Depend on chdb-rust `feat_arrow_insert` (Arrow C Data Interface insert); Docker builds clone that branch; root/proxy Dockerfiles stub all workspace crates for layer caching. - Split parser: dedicated `lexer.rs` + `ddl_parser.rs` for token-driven InfluxQL DDL/SHOW/auth (CREATE/DROP/ALTER DB/RP/user, GRANT/REVOKE, SHOW DATABASES/MEASUREMENTS/TAG KEYS/TAG VALUES/FIELD KEYS/SERIES/CQs/MVs). - Major `to_clickhouse.rs` expansion: - Raw selects always project `time`, ascending order. - GROUP BY time defaults, fill/null/with bounds, tag ordering. - Materialized view backfill column ordering and dest insert mapping. - Rollup fact views (`build_coalesced_fact_view_*`): sum for additive fields, mean → sum/count rewrite on rollup measurements. - CQ bounded SELECT INTO translation and time-window predicate stripping. - `predicate_sql`: shared WHERE → SQL for DELETE / DROP SERIES (local + replication). - `field_type` domain module; `rollup` combine semantics for MV/CQ fields. - InfluxDB v1 CQ scheduling (`domain/cq_schedule.rs`): bucket alignment, RESAMPLE EVERY/FOR validation, coverage windows, boundary-aligned `should_run`, execution interval derivation. - `QueryService::execute_continuous_query`; reconstruct CQ text for replay. - `MaterializedViewService` and `ContinuousQueryService` wired to new schedule metadata and bounded backfill paths. - Peer/cluster: `PeerQueryService` Raft mutation forwarding, leader addr resolution (forward node → Raft → cluster membership → metrics leader), MV source/dest retention policy resolution. - `schema_mutation_apply`: single apply path for Raft state machine, `/internal/replicate-mutation`, and startup metadata sync (metadata + chDB DDL side effects). - RocksDB metadata adapter extended for CQ schedule fields, rollups, and richer measurement meta. - Replication apply, hinted handoff, drain, bootstrap, and Raft log/state machine updated for prepared WAL and schema mutations. - Expanded SHOW/DDL execution, SELECT INTO, retention policy normalization, tag key/value discovery from series tables, authorization checks. - CLI 0.8.3: admin/query/export/import/repl hooks for new statement types; e2e test coverage extended. - `tikv-jemallocator` with background purging: return transient startup heap (series dedup warm + WAL replay) to the OS instead of pinning RSS. - Default retention sweep interval 12h (was 60s). - Grafana dashboards refreshed (cluster, logging, machine-monitoring); Kind CR manifest and docker-compose aligned with new config knobs. - `scripts/load.sh` updated for pool/Arrow WAL load testing. - New compat suites: `combination_tests` (full parse→translate→execute interaction tests), `cq_tests`, `prepared_wal_tests`. - Expanded `ddl_tests`, `query_tests`, `metadata_tests`, `http_tests`. - Integration/raft/sync_quorum tests updated for prepared WAL and pooling. - Bench stubs adjusted for new ingest signatures. BREAKING CHANGE: chDB session pooling semantics changed — `pool_size` now opens multiple same-path connections (real concurrency) instead of being ignored/warned. Tune `pool_size` and `max_concurrent_queries` together. New config keys: `storage.wal_format`, `flush.arrow_wal_enabled`. Default retention interval is now 12h.

feat: add a parallelized version of coalesing and WAL chore: update docs fix: remove tracing as it was panicing tokio main threads. Shall re-visit later.

austin-barrington added 5 commits June 22, 2026 09:36

fix: add docker-compose.getting-started.yml

0726d92

feat: add a parallelized version of coalesing and WAL chore: update docs fix: remove tracing as it was panicing tokio main threads. Shall re-visit later.

fixing tests

139acc9

fix tests

aae14c2

fix: compatibility tests

5d6d2c8

austin-barrington merged commit a48ef9c into main Jun 24, 2026
4 checks passed

austin-barrington deleted the feat_con_pool branch July 1, 2026 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): chDB connection pool, Arrow WAL zero-copy flush, and InfluxQL parity#52

feat(core): chDB connection pool, Arrow WAL zero-copy flush, and InfluxQL parity#52
austin-barrington merged 5 commits into
mainfrom
feat_con_pool

austin-barrington commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

austin-barrington commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant