fix(gstack-gbrain-sync): restore startLockKeepalive to prevent stale-lock takeover#1372
Closed
marko-durasic wants to merge 1 commit intogarrytan:mainfrom
Closed
Conversation
…lock takeover
`acquireLock` still classifies any lock file with `mtime` older than
`STALE_LOCK_MS` (5 minutes) as stale and takes it over. The keepalive
that refreshed the lock file's mtime every 60 s while a long sync ran
was removed at some point. With `--full` documented at "honest ~25-35
min for big Macs (ED2)," any second invocation arriving after 5 minutes
will incorrectly take the lock — two sync processes then race on the
shared state JSON (`~/.gstack/.gbrain-sync-state.json`).
This commit re-introduces:
- `utimesSync` import from `fs`
- `startLockKeepalive()` — refreshes lock mtime every 60 s while the
PID matches the lock owner, best-effort error handling.
- `lockKeepalive` interval handle in `main()`, started right after
`acquireLock()` succeeds and cleared in the `cleanup` closure
(also covered by SIGINT/SIGTERM cleanup).
No behavior change for `--dry-run` (still skips lock entirely) or for
short `--incremental` runs (steady-state ~50 ms — the keepalive simply
never fires).
Verified locally with `bun build bin/gstack-gbrain-sync.ts` (clean
bundle, 18.20 KB entry point).
Co-authored-by: Cursor <cursoragent@cursor.com>
Author
|
Sibling PR: #1373 (Tier 1 |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
bin/gstack-gbrain-sync.tskeeps the 5-minute stale-lock takeover inacquireLock(filemtimeolder thanSTALE_LOCK_MS = 5 * 60 * 1000⇒unlinkSync(LOCK_PATH)and proceed), but the keepalive that refreshed the lock file'smtimeevery 60 s while a long sync was running has been removed.--fullmode is documented in the file's own header as "honest ~25-35 min for big Macs (ED2)." Any concurrent/sync-gbraininvocation arriving more than 5 minutes after a--fullrun started will:mtimeis "stale,"unlinkSyncit.LOCK_PATHand proceed.runMemoryIngest/runBrainSyncPushandsaveSyncStateagainst~/.gstack/.gbrain-sync-state.jsonsimultaneously — the atomictmp + renamewrite protects each individual save, but two writers still race; the last writer wins, the other one'slast_stagesis lost, and partial federated state can leak between runs.Fix
Re-introduces three small things:
utimesSyncimport in thefsimport line.startLockKeepalive()function right afterreleaseLock()— refreshes lock mtime every 60 s, no-op if the lock file is gone or owned by a different PID, errors are swallowed (best-effort).lockKeepalivehandle inmain()— started right afteracquireLock()returns true; cleared in the existingcleanupclosure (which is also run on SIGINT/SIGTERM).Diff is +24/-1 lines, fully scoped to
bin/gstack-gbrain-sync.ts.Behavior matrix
--dry-run--incremental(~50 ms)--full(25-35 min)mtime, getsEEXISTonwriteFileSync(... { flag: "wx" }), exits with code 2 and the documented "another /sync-gbrain is running" errorTest plan
bun build bin/gstack-gbrain-sync.ts --target=bun— clean bundle, 18.20 KB entry point, no type errors.cleanupclosure (interval is cleared before lock is released; both ordering paths are safe).--fullsmoke test (would require a populated gbrain corpus; not executed here).Discovery context
Surfaced while verifying behavior against the DuReef vendored mirror of
gstack(we keep aninspiration/gstack/tree pinned to a known-good SHA). The mirror atdb9447c3had this function; comparing againstmainHEAD (443bde05) showed it had been removed while the 5-min staleness check still pointed at it implicitly. Filing here so DuReef and any other downstream consumer doing long--fullruns doesn't hit the race after the next mirror sync.Companion DuReef-side bookkeeping: DuReef/workspace#49 — read-only watch entry in our
PORT-CANDIDATES.mdso reviewers know to refuse this regression on nextchore(inspiration)sync if this upstream PR isn't merged first.Out of scope
VERSION/CHANGELOG.mdbump (left to maintainer's release cadence).gstack-gbrain-detect's Tier 1\$mtypecase (filed separately).Made with Cursor
Need help on this PR? Tag
@codesmithwith what you need.