Skip to content

persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044

Merged
wen-coding merged 1 commit intomainfrom
wen/use_wal_for_persistence
Mar 24, 2026
Merged

persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044
wen-coding merged 1 commit intomainfrom
wen/use_wal_for_persistence

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Mar 9, 2026

Summary

Replace file-per-block and file-per-commitQC persistence with sei-db/wal, extract common WAL mechanics into a generic indexedWAL[T], and expose truncate-then-append APIs so the caller (avail/state) controls all parallelism.

New: wal.go — generic indexedWAL[T]

Generic wrapper around sei-db/wal with monotonic index tracking, typed serialization via codec[T], and lifecycle methods: Write, ReadAll, TruncateBefore (with verify callback and bounds validation), TruncateAll, FirstIdx, Count, Close. Enforces INVARIANT: firstIdx <= nextIdx. Opens with AllowEmpty: true (critical for correct empty-log arithmetic). ReadAll includes a post-replay count check to detect silent data loss. Fsync enabled for additional durability beyond the prune anchor.

blocks.go — per-lane WAL persistence

Three-layer structure: laneWALState (lock-free helpers) → laneWAL (per-lane mutex) → BlockPersister (lane map + RWMutex).

  • One WAL per lane in blocks/<hex_lane_id>/ subdirectories with independent per-lane truncation.
  • laneWALState provides lock-free methods (persistBlock, truncateForAnchor, loadAll) that assume the caller holds the per-lane lock. truncateForAnchor derives the prune cursor from the CommitQC, verifying block numbers via a defense-in-depth callback. Handles TruncateAll when the anchor advanced past all persisted blocks and re-anchors nextBlockNum. loadAll detects gaps at replay time.
  • laneWAL wraps a laneWALState with a utils.Mutex, providing maybePruneAndPersist and close. The lock is held for the entire truncate-then-append sequence.
  • MaybePruneAndPersistLane (public): truncate-then-append API for a single lane. Creates lane WALs lazily via getOrCreateLane (double-checked locking under utils.RWMutex), then delegates to laneWAL.maybePruneAndPersist. Does not spawn goroutines — the caller schedules parallelism. No-op persister (dir=None) skips disk I/O but still invokes afterEach.
  • close() unexported — only used by tests and constructor error cleanup; uses errors.Join.
  • Removed: public PersistBlock, DeleteBefore, DeleteBeforeLane, LaneIDsWithWAL.

commitqcs.go — single CommitQC WAL

Two-layer structure: commitQCState (lock-free helpers) → CommitQCPersister (single mutex).

  • Single WAL in commitqcs/, linear RoadIndex-to-WAL-index mapping via FirstIdx().
  • commitQCState provides lock-free methods: persistCommitQC silently ignores duplicates (idx < next) for idempotent startup, rejects gaps (idx > next). deleteBefore truncates the WAL, handles TruncateAll + cursor advance for anchor-past-all, and re-persists the anchor QC for crash recovery.
  • MaybePruneAndPersist (public): truncate-then-append API. Holds the lock for the entire sequence — truncates below the anchor, then appends new QCs with afterEach callback. Callers need not coordinate ordering.
  • Close() is idempotent.
  • loadAllCommitQCs detects gaps at replay time.
  • Removed: public PersistCommitQC, DeleteBefore, CommitQCPersistBatch struct, ResetNext.

state.go — orchestration and parallelism

  • NewState startup: prunes stale WAL entries via MaybePruneAndPersistLane (per committee lane) and MaybePruneAndPersist with anchor-only (nil proposals), replacing the old DeleteBefore calls.
  • runPersist write order:
    1. Prune anchor persisted sequentially — establishes crash-recovery watermark.
    2. scope.Parallel fans out: one task for MaybePruneAndPersist (commit-QCs), one task per committee lane for MaybePruneAndPersistLane (blocks). No early cancellation; first error returned after all tasks finish. Each path invokes markCommitQCsPersisted / markBlockPersisted per entry so voting unblocks ASAP.
  • anchorQC typed as utils.Option[*types.CommitQC] — drives both commit-QC and block WAL truncation.
  • blocksByLane built from batch; all committee lanes iterated for scope.Parallel so truncation runs even for lanes with no new blocks.
  • persistBatch simplified: removed laneFirsts and commitQCFirst (derivable from anchorQC).
  • No goroutines inside the persist package; all concurrency managed by state.go.

inner.go

  • TODO added for dynamic committee support (new members need entries in blocks, votes, nextBlockToPersist, persistedBlockStart).
  • nextBlockToPersist doc clarifies per-lane-per-batch notification frequency.

Design decisions

  • No-op persisters: when stateDir is None, disk I/O is skipped but cursors advance via afterEach callbacks, preventing runPersist from spinning.
  • Infallible afterEach: callbacks update in-memory state only, so they cannot fail. This avoids a latent bug where a failed callback after a successful WAL write would stall the cursor.
  • Stale lane removal deferred: with epoch-based fixed committees, lane cleanup is unnecessary until dynamic committee changes are supported.
  • Go 1.22+ loop variable semantics: go.mod requires Go 1.25+, so per-iteration loop variables are guaranteed — no explicit v := v in closures.

Concurrency design

Both BlockPersister and CommitQCPersister are internally thread-safe. Mutable state is protected with utils.Mutex/utils.RWMutex. Both hold their respective locks for the full truncate-then-append sequence, so callers need not coordinate ordering.

BlockPersister — two-level locking:

  • utils.RWMutex[map[LaneID]*laneWAL] on the lanes map: getOrCreateLane uses double-checked locking; close() takes write Lock.
  • utils.Mutex[*laneWALState] per lane: serializes writes and truncations within laneWAL.maybePruneAndPersist. Different lanes are fully parallel.
  • NOTE: MaybePruneAndPersistLane releases the map RLock before acquiring the per-lane lock. This is safe because lanes are only added, never removed.

CommitQCPersisterutils.Mutex[*commitQCState] protecting the WAL handle and cursor. MaybePruneAndPersist holds the lock for truncation, anchor re-persist, and all appends.

Test plan

  • All autobahn tests pass (avail, consensus, persist, data, types)
  • Race detector clean (-race) on persist and avail packages
  • gofmt, go vet, golangci-lint: 0 issues
  • TestState (no-op persist) and TestStateWithPersistence (disk persist with prune cycles)
  • TestStateRestartFromPersisted (full restart from WAL data)
  • Edge cases: anchor-past-all truncation, crash recovery, gap detection, concurrent multi-lane writes, no-op persister cursor advancement

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 24, 2026, 3:12 PM

@wen-coding wen-coding changed the title persist: refactor block/commitqc WAL into generic indexedWAL persist: replace file-per-entry with WAL and refactor into generic indexedWAL Mar 9, 2026
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 74.83660% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.61%. Comparing base (491d2c6) to head (33720ff).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...mint/internal/autobahn/consensus/persist/blocks.go 76.97% 22 Missing and 10 partials ⚠️
...dermint/internal/autobahn/consensus/persist/wal.go 63.93% 12 Missing and 10 partials ⚠️
...t/internal/autobahn/consensus/persist/commitqcs.go 78.20% 11 Missing and 6 partials ⚠️
sei-tendermint/internal/autobahn/avail/state.go 78.57% 4 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #3044    +/-   ##
========================================
  Coverage   58.60%   58.61%            
========================================
  Files        2096     2097     +1     
  Lines      173400   173502   +102     
========================================
+ Hits       101621   101696    +75     
- Misses      62739    62761    +22     
- Partials     9040     9045     +5     
Flag Coverage Δ
sei-chain-pr 78.57% <74.83%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/autobahn/avail/inner.go 97.36% <ø> (ø)
sei-tendermint/internal/autobahn/avail/state.go 77.41% <78.57%> (+1.73%) ⬆️
...t/internal/autobahn/consensus/persist/commitqcs.go 78.16% <78.20%> (-0.57%) ⬇️
...dermint/internal/autobahn/consensus/persist/wal.go 63.93% <63.93%> (ø)
...mint/internal/autobahn/consensus/persist/blocks.go 75.83% <76.97%> (-1.31%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +155 to +168
for lane, first := range laneFirsts {
lw, ok := bp.lanes[lane]
if !ok {
continue // no WAL yet; PersistBlock will create one lazily
}
lane, fileN, err := parseBlockFilename(entry.Name())
if err != nil {
firstBN, ok := lw.firstBlockNum().Get()
if !ok || first <= firstBN {
continue
}
first, ok := laneFirsts[lane]
if ok && fileN >= first {
continue
walIdx := lw.firstIdx + uint64(first-firstBN)
if err := lw.TruncateBefore(walIdx); err != nil {
return fmt.Errorf("truncate lane %s WAL before block %d: %w", lane, first, err)
}
path := filepath.Join(bp.dir, entry.Name())
if err := os.Remove(path); err != nil && !os.IsNotExist(err) {
logger.Warn("failed to delete block file", "path", path, "err", err)
}

Check warning

Code scanning / CodeQL

Iteration over map

Iteration over map may be a possible source of non-determinism
@wen-coding wen-coding requested a review from pompon0 March 10, 2026 23:52
dbwal.Config{
WriteBufferSize: 0, // synchronous writes
WriteBatchSize: 1, // no batching
FsyncEnabled: true,
Copy link
Contributor

@yzang2019 yzang2019 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware of the latency impact here, if putting write in critical path, it would introduce some noticeable latency. Would recommend synchronous write + nofsync for perf reason, fsync does provide stronger guarantees, but the chance of all validators hitting power off at the same time is pretty rare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable, changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzang2019 Talked with @pompon0 about this, since lane block QC is only f+1, theoretically one OS crash can screw us. We are changing all lanes and commitqc writes to happen concurrently, so latency is less of a concern. Is it okay if I change the Fsync back to true here?

Copy link
Contributor

@yzang2019 yzang2019 Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if latency is less of a concern, we can start with fsync=true. But I'd recommend we add a metric/log around this to measure how long does it take to do this flush, in case the latency becomes a problem in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this PR is already too big, I'll add metrics in a future PR.

if err := w.wal.Write(entry); err != nil {
return err
}
if w.firstIdx == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend using Count() == 0 instead of relying on firstIdx == 0 as a sentinel for "WAL is empty" incase the assumption of wal starting index from 0 is not valid in the future if we switch the wal library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch from c0727e6 to 422aa8f Compare March 11, 2026 04:29
if ok && fileN >= first {
if first >= lw.nextBlockNum {
// Anchor advanced past all persisted blocks for this lane.
if err := lw.TruncateAll(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is truncation synchronous? How expensive it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TruncateAll is synchronous, it removes the segment files and then change some internal pointers, not too expensive. I don't expect this to happen very often though. In practice if every validator keeps emitting blocks, there should always be 1 or 2 lane blocks which are generated but not in AppQC yet. Unless they do "generate a block then wait for a while, generate another block then wait for a while", are you imagining that as an attack?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in a happy path. It would be nice to make sure that removing blocks does not affect the latency of persisting blocks. What I want to avoid is that we do synchronously: remove block -> fsync -> insert block -> fsync (if we have a steady stream of 1 insertion/removal of block per batch) in which case the latency of insertion is 2x larger than it should be (2 fsyncs).

@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch from aeeddbf to 8de2a83 Compare March 16, 2026 17:04
@wen-coding wen-coding requested a review from pompon0 March 17, 2026 17:45
@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch from b6be8e4 to 273ab80 Compare March 17, 2026 17:47
@wen-coding wen-coding requested a review from pompon0 March 19, 2026 16:07
@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch 8 times, most recently from a870b63 to 8921433 Compare March 23, 2026 21:44
func (s *laneWALState) truncateForAnchor(lane types.LaneID, qc *types.CommitQC) error {
first := qc.LaneRange(lane).First()
firstBN, ok := s.firstBlockNum().Get()
if !ok || first <= firstBN {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if firstBlockNum is None, this function does not set the nextBlockNum correctly if anchor is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

…dexedWAL

Replaces the file-per-entry persistence for blocks and CommitQCs with
WAL-based persistence (sei-db/wal), wrapped in a generic indexedWAL[T]
that tracks monotonic indices and typed entries.

Key changes:
- indexedWAL[T]: generic WAL wrapper with Write, TruncateBefore,
  TruncateAll, ReadAll, and index tracking.
- BlockPersister: three-layer architecture (laneWALState -> laneWAL ->
  BlockPersister) with per-lane WALs for independent truncation.
  MaybePruneAndPersistLane holds the per-lane lock for the entire
  truncate-then-append sequence.
- CommitQCPersister: two-layer architecture (commitQCState ->
  CommitQCPersister). MaybePruneAndPersist holds the lock for the
  entire truncate-then-append sequence.
- afterEach callbacks use utils.Option[func(T)] and are documented as
  being invoked under lock.
- truncateForAnchor takes BlockNumber directly instead of CommitQC,
  allowing testDeleteBefore to call it without duplicating logic.
- Fix: truncateForAnchor on an empty WAL now advances nextBlockNum
  when the anchor is ahead.

Made-with: Cursor
@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch from 157112c to 33720ff Compare March 24, 2026 15:11
@wen-coding wen-coding enabled auto-merge March 24, 2026 15:12
@wen-coding wen-coding added this pull request to the merge queue Mar 24, 2026
Merged via the queue into main with commit 1141939 Mar 24, 2026
39 of 40 checks passed
@wen-coding wen-coding deleted the wen/use_wal_for_persistence branch March 24, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants