feat(core): chDB connection pool, Arrow WAL zero-copy flush, and InfluxQL parity#52
Merged
Conversation
…uxQL parity
Re-architect the write path around chDB-ready Arrow batches, enable real
same-path connection pooling for concurrent flush inserts and queries, and
close major gaps in InfluxDB v1 TimeseriesQL semantics (DDL, SHOW, CQ
scheduling, rollups/MVs, SELECT INTO). Bump workspace to 0.8.3.
This is a cross-cutting release: ingest, WAL, flush, chDB adapter, query
translation, cluster schema apply, and observability all move together so
the fast path is correct end-to-end rather than bolted on in one layer.
- Add `ChdbConnectionPool`: N independent `Connection`s to the same
`--path`, each with its own `ChdbClient` mutex (libchdb process-global
singleton per path — see chdb/insert/concurrency.rs).
- Round-robin checkout with `try_lock` on busy slots; clamp `pool_size`
to 1..=32 (default 4).
- Rewire `ChdbNativeAdapter`, `ChdbQueryAdapter`, and `ChdbSession` to
use the pool instead of a single shared session.
- Fix config/docs: `chdb.pool_size > 1` is now real parallelism, not a
deprecated no-op. Recommend `server.max_concurrent_queries >= pool_size`.
- Update system-architecture.md to describe same-path multi-connection
semantics (replacing the old per-slot subdirectory model).
- Build chDB-ready fact-table `RecordBatch`es at ingest time
(`application/arrow_ingest/` for line protocol, columnar, msgpack, points).
- Introduce `PreparedWalSlot` / `PreparedMeasurementBatch` domain types with
post-assign `ingest_seq` patching (`domain/prepared_wal.rs`).
- Coalesce sparse prepared batches per measurement (`domain/arrow_coalesce`).
- In-memory `WalArrowCache` indexes unflushed prepared slots by sequence;
bounded `take_range(from, to_inclusive)` avoids evicting post-snapshot
entries (fixes steady ~50% cache-miss under continuous load).
- Versioned on-disk WAL encoding via `wal_ipc` (`HBWA` magic, v1): optional
`storage.wal_format = "arrow_ipc"` alongside legacy bincode.
- `FlushService` flushes prepared chunks directly through
`insert_record_batch_direct` — no re-parse/re-coalesce on the hot path.
- `flush.arrow_wal_enabled` (default true) gates the RAM cache; metrics
gauge `hyperbytedb_wal_arrow_cache_entries` for growth/OOM watch.
- `application/wal_append.rs` bundles prepared slots with legacy WalEntry
for peer sync compatibility.
- `build_prepared_wal_slot`, `write_prepared_batch`, schema cache refresh
from metadata, and field-type widening reconciliation
(`ALTER TABLE ... MODIFY COLUMN` when metadata union exceeds cached types).
- Engine DDL: raw facts use `ReplacingMergeTree(ingest_seq)`; rollup/MV
destinations with additive partials use `SummingMergeTree` on sum columns.
- Pad sparse legacy WAL batches to full ensured column sets before insert.
- New HTTP `/internal/chdb` adapter hook for admin/debug paths.
- Depend on chdb-rust `feat_arrow_insert` (Arrow C Data Interface insert);
Docker builds clone that branch; root/proxy Dockerfiles stub all workspace
crates for layer caching.
- Split parser: dedicated `lexer.rs` + `ddl_parser.rs` for token-driven
InfluxQL DDL/SHOW/auth (CREATE/DROP/ALTER DB/RP/user, GRANT/REVOKE,
SHOW DATABASES/MEASUREMENTS/TAG KEYS/TAG VALUES/FIELD KEYS/SERIES/CQs/MVs).
- Major `to_clickhouse.rs` expansion:
- Raw selects always project `time`, ascending order.
- GROUP BY time defaults, fill/null/with bounds, tag ordering.
- Materialized view backfill column ordering and dest insert mapping.
- Rollup fact views (`build_coalesced_fact_view_*`): sum for additive
fields, mean → sum/count rewrite on rollup measurements.
- CQ bounded SELECT INTO translation and time-window predicate stripping.
- `predicate_sql`: shared WHERE → SQL for DELETE / DROP SERIES (local +
replication).
- `field_type` domain module; `rollup` combine semantics for MV/CQ fields.
- InfluxDB v1 CQ scheduling (`domain/cq_schedule.rs`): bucket alignment,
RESAMPLE EVERY/FOR validation, coverage windows, boundary-aligned
`should_run`, execution interval derivation.
- `QueryService::execute_continuous_query`; reconstruct CQ text for replay.
- `MaterializedViewService` and `ContinuousQueryService` wired to new
schedule metadata and bounded backfill paths.
- Peer/cluster: `PeerQueryService` Raft mutation forwarding, leader addr
resolution (forward node → Raft → cluster membership → metrics leader),
MV source/dest retention policy resolution.
- `schema_mutation_apply`: single apply path for Raft state machine,
`/internal/replicate-mutation`, and startup metadata sync (metadata +
chDB DDL side effects).
- RocksDB metadata adapter extended for CQ schedule fields, rollups, and
richer measurement meta.
- Replication apply, hinted handoff, drain, bootstrap, and Raft log/state
machine updated for prepared WAL and schema mutations.
- Expanded SHOW/DDL execution, SELECT INTO, retention policy normalization,
tag key/value discovery from series tables, authorization checks.
- CLI 0.8.3: admin/query/export/import/repl hooks for new statement types;
e2e test coverage extended.
- `tikv-jemallocator` with background purging: return transient startup
heap (series dedup warm + WAL replay) to the OS instead of pinning RSS.
- Default retention sweep interval 12h (was 60s).
- Grafana dashboards refreshed (cluster, logging, machine-monitoring);
Kind CR manifest and docker-compose aligned with new config knobs.
- `scripts/load.sh` updated for pool/Arrow WAL load testing.
- New compat suites: `combination_tests` (full parse→translate→execute
interaction tests), `cq_tests`, `prepared_wal_tests`.
- Expanded `ddl_tests`, `query_tests`, `metadata_tests`, `http_tests`.
- Integration/raft/sync_quorum tests updated for prepared WAL and pooling.
- Bench stubs adjusted for new ingest signatures.
BREAKING CHANGE: chDB session pooling semantics changed — `pool_size` now
opens multiple same-path connections (real concurrency) instead of being
ignored/warned. Tune `pool_size` and `max_concurrent_queries` together.
New config keys: `storage.wal_format`, `flush.arrow_wal_enabled`.
Default retention interval is now 12h.
feat: add a parallelized version of coalesing and WAL chore: update docs fix: remove tracing as it was panicing tokio main threads. Shall re-visit later.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-architect the write path around chDB-ready Arrow batches, enable real
same-path connection pooling for concurrent flush inserts and queries, and
close major gaps in InfluxDB v1 TimeseriesQL semantics (DDL, SHOW, CQ
scheduling, rollups/MVs, SELECT INTO). Bump workspace to 0.8.3.
This is a cross-cutting release: ingest, WAL, flush, chDB adapter, query
translation, cluster schema apply, and observability all move together so
the fast path is correct end-to-end rather than bolted on in one layer.
Add
ChdbConnectionPool: N independentConnections to the same--path, each with its ownChdbClientmutex (libchdb process-globalsingleton per path — see chdb/insert/concurrency.rs).
Round-robin checkout with
try_lockon busy slots; clamppool_sizeto 1..=32 (default 4).
Rewire
ChdbNativeAdapter,ChdbQueryAdapter, andChdbSessiontouse the pool instead of a single shared session.
Fix config/docs:
chdb.pool_size > 1is now real parallelism, not adeprecated no-op. Recommend
server.max_concurrent_queries >= pool_size.Update system-architecture.md to describe same-path multi-connection
semantics (replacing the old per-slot subdirectory model).
Build chDB-ready fact-table
RecordBatches at ingest time(
application/arrow_ingest/for line protocol, columnar, msgpack, points).Introduce
PreparedWalSlot/PreparedMeasurementBatchdomain types withpost-assign
ingest_seqpatching (domain/prepared_wal.rs).Coalesce sparse prepared batches per measurement (
domain/arrow_coalesce).In-memory
WalArrowCacheindexes unflushed prepared slots by sequence;bounded
take_range(from, to_inclusive)avoids evicting post-snapshotentries (fixes steady ~50% cache-miss under continuous load).
Versioned on-disk WAL encoding via
wal_ipc(HBWAmagic, v1): optionalstorage.wal_format = "arrow_ipc"alongside legacy bincode.FlushServiceflushes prepared chunks directly throughinsert_record_batch_direct— no re-parse/re-coalesce on the hot path.flush.arrow_wal_enabled(default true) gates the RAM cache; metricsgauge
hyperbytedb_wal_arrow_cache_entriesfor growth/OOM watch.application/wal_append.rsbundles prepared slots with legacy WalEntryfor peer sync compatibility.
build_prepared_wal_slot,write_prepared_batch, schema cache refreshfrom metadata, and field-type widening reconciliation
(
ALTER TABLE ... MODIFY COLUMNwhen metadata union exceeds cached types).Engine DDL: raw facts use
ReplacingMergeTree(ingest_seq); rollup/MVdestinations with additive partials use
SummingMergeTreeon sum columns.Pad sparse legacy WAL batches to full ensured column sets before insert.
New HTTP
/internal/chdbadapter hook for admin/debug paths.Depend on chdb-rust
feat_arrow_insert(Arrow C Data Interface insert);Docker builds clone that branch; root/proxy Dockerfiles stub all workspace
crates for layer caching.
Split parser: dedicated
lexer.rs+ddl_parser.rsfor token-drivenInfluxQL DDL/SHOW/auth (CREATE/DROP/ALTER DB/RP/user, GRANT/REVOKE,
SHOW DATABASES/MEASUREMENTS/TAG KEYS/TAG VALUES/FIELD KEYS/SERIES/CQs/MVs).
Major
to_clickhouse.rsexpansion:time, ascending order.build_coalesced_fact_view_*): sum for additivefields, mean → sum/count rewrite on rollup measurements.
predicate_sql: shared WHERE → SQL for DELETE / DROP SERIES (local +replication).
field_typedomain module;rollupcombine semantics for MV/CQ fields.InfluxDB v1 CQ scheduling (
domain/cq_schedule.rs): bucket alignment,RESAMPLE EVERY/FOR validation, coverage windows, boundary-aligned
should_run, execution interval derivation.QueryService::execute_continuous_query; reconstruct CQ text for replay.MaterializedViewServiceandContinuousQueryServicewired to newschedule metadata and bounded backfill paths.
Peer/cluster:
PeerQueryServiceRaft mutation forwarding, leader addrresolution (forward node → Raft → cluster membership → metrics leader),
MV source/dest retention policy resolution.
schema_mutation_apply: single apply path for Raft state machine,/internal/replicate-mutation, and startup metadata sync (metadata +chDB DDL side effects).
RocksDB metadata adapter extended for CQ schedule fields, rollups, and
richer measurement meta.
Replication apply, hinted handoff, drain, bootstrap, and Raft log/state
machine updated for prepared WAL and schema mutations.
Expanded SHOW/DDL execution, SELECT INTO, retention policy normalization,
tag key/value discovery from series tables, authorization checks.
CLI 0.8.3: admin/query/export/import/repl hooks for new statement types;
e2e test coverage extended.
tikv-jemallocatorwith background purging: return transient startupheap (series dedup warm + WAL replay) to the OS instead of pinning RSS.
Default retention sweep interval 12h (was 60s).
Grafana dashboards refreshed (cluster, logging, machine-monitoring);
Kind CR manifest and docker-compose aligned with new config knobs.
scripts/load.shupdated for pool/Arrow WAL load testing.New compat suites:
combination_tests(full parse→translate→executeinteraction tests),
cq_tests,prepared_wal_tests.Expanded
ddl_tests,query_tests,metadata_tests,http_tests.Integration/raft/sync_quorum tests updated for prepared WAL and pooling.
Bench stubs adjusted for new ingest signatures.
BREAKING CHANGE: chDB session pooling semantics changed —
pool_sizenowopens multiple same-path connections (real concurrency) instead of being
ignored/warned. Tune
pool_sizeandmax_concurrent_queriestogether.New config keys:
storage.wal_format,flush.arrow_wal_enabled.Default retention interval is now 12h.