test(agent-004): add vitest unit tests for helper math#100
Conversation
Mirrors the AGENT-002 / AGENT-003 structure to give AGENT-004 Validator Monitor the same standalone TypeScript implementation: - `agent-004-validator-monitor/` — standalone Node.js process - `src/index.ts` — banner, cycle runner, polling vs single-run mode - `src/config.ts` — config + thresholds mirrored from the character - `src/types.ts` — staking / slashing / gov types + per-workflow shapes - `src/ledger.ts` — LCD client for staking, slashing, gov endpoints - `src/store.ts` — SQLite state (token snapshots, commission history, scorecards, decentralization snapshots, workflow executions) - `src/ooda.ts` — generic OODA executor (same shape as agent-002/003) - `src/monitor.ts` — Claude narrative layer, one function per workflow - `src/output.ts` — console + optional Discord webhook dispatcher - `src/workflows/performance-tracking.ts` — WF-VM-01 - `src/workflows/delegation-flow-analysis.ts` — WF-VM-02 - `src/workflows/decentralization-monitor.ts` — WF-VM-03 Scoring (M014): - Uptime component: signed_blocks_window − missed_blocks_counter, weighted at config.validator.scoreWeightUptime (default 400) - Governance component: votes cast / recent finalized proposals, weighted at config.validator.scoreWeightGovernance (default 350) - Stability component: full weight minus penalties for jailing (100) and commission changes in the trailing window (40 each), floored at 0, weighted at config.validator.scoreWeightStability (default 250) - Composite = sum, 0..1000, PoA eligible when >= 800 Decentralization (WF-VM-03): - Nakamoto coefficient uses the 33.4% halt-threshold convention - Gini index uses the textbook formula on token amounts normalized to uregen - Health classification from Nakamoto floor (<=5 CRITICAL, <=8 WARNING), Gini ceiling (>=0.65 WARNING), and single-validator concentration (>=33% CRITICAL, >=20% WARNING) Delegation flows (WF-VM-02): - Snapshot each validator's `tokens` field every cycle - Derive flows from the delta against the previous snapshot - Whale-sized movements (>= 100,000 REGEN by default) tagged in the summary and raise the alert level Determinism: all scoring, Nakamoto, Gini, health, and whale detection are computed in plain TypeScript. Claude is only used for the narrative layer. Keeps the agent cheap, reproducible, and auditable — and makes the hard parts trivially unit-testable in a follow-up PR. Deliberate MVP proxies (documented in the agent README): 1. Token-delta as the delegation source for WF-VM-02 until a real MsgDelegate / MsgUndelegate / MsgRedelegate tx-stream client lands. 2. Governance participation currently scores 0 for every validator because the operator→delegator bech32 conversion is deferred to a follow-up PR. Relative composites still surface real differences in uptime and stability without asymmetric penalty. 3. MVP signing-info join on the raw consensus key — falls back to assuming 100% uptime when no match is found, which under-counts real issues rather than smearing a healthy validator. CI: the new agent is added to the `agents` job so `npx tsc --noEmit` runs against it on every PR, matching the existing agent-002-governance-analyst wiring. - Lands in: `agent-004-validator-monitor/`, `.github/workflows/ci.yml` - Changes: new standalone AGENT-004 process with 3 workflows (WF-VM-01/02/03) - Validate: `cd agent-004-validator-monitor && npm ci && npx tsc --noEmit` Refs phase-2/2.2-agentic-workflows.md §WF-VM-01, §WF-VM-02, §WF-VM-03 Refs agents/packages/agents/src/characters/validator-monitor.ts (regen-network#64)
Sibling PR to the AGENT-003 unit tests (regen-network#99) and a follow-up to PR regen-network#81, which promised the unit tests as a "separate test-only PR so this PR stays a single-concern 'add the agent' change." Adds 33 unit tests across 2 test files covering every deterministic helper in the AGENT-004 workflows. The helpers are the core of the decentralization analysis surface — if they drift silently, the validator monitor produces misleading alerts or misses real concentration attacks. ## Changes ### Helper exports Five previously module-private helpers are now exported so the test files can import them: decentralization-monitor.ts: nakamotoCoefficient, giniIndex, topNSharePct, classifyHealth delegation-flow-analysis.ts: absBig The export is the only production-code change — no behavior change, no API rename. Module consumers are unchanged. ### Test files src/workflows/decentralization-monitor.test.ts (28 tests) nakamotoCoefficient — 8 tests - empty input / zero total returns 0 - single validator with entire stake → 1 - top validator > 33.4% → 1 - top validator exactly 33.4% (334/1000) → 2 (pins the STRICT `> threshold` predicate — a refactor that changes > to >= would silently produce Nakamoto = 1 here) - top two combined clear threshold → 2 - ten equal validators → 4 - degenerate case when total > sum of list giniIndex — 7 tests - empty / single-element → 0 - perfect equality → 0 - unequal distribution > 0 - maximally unequal (one holds everything) → approaches 1 - all-zeros → 0 (cumulative guard) - monotonicity: more inequality → higher Gini topNSharePct — 6 tests - zero total → 0 - top 1, top 3 cumulative share - n > array length → 100% - n = array length → 100% - two-decimal precision classifyHealth — 7 tests - HEALTHY baseline - CRITICAL on Nakamoto floor - CRITICAL on single-validator concentration - WARNING on Nakamoto warning floor - WARNING on Gini ceiling - WARNING on single-validator warning concentration - CRITICAL wins over WARNING when thresholds overlap src/workflows/delegation-flow-analysis.test.ts (5 tests) absBig — 5 tests - zero - positive inputs - negative inputs - values beyond Number.MAX_SAFE_INTEGER (2^53 + 1) — critical because AGENT-004 works in uregen and 221M REGEN is 2.21e14 uregen, close to the unsafe-integer boundary - idempotence ### Vitest setup Same structure as AGENT-003's test PR (regen-network#99): vitest.config.ts — standard config, node env package.json — adds "test" / "test:watch" + vitest ^2.1.0 tsconfig.json — excludes *.test.ts from prod typecheck .gitignore — adds *.db-shm and *.db-wal ## Validation $ cd agent-004-validator-monitor && npm test Test Files 2 passed (2) Tests 33 passed (33) $ cd agent-004-validator-monitor && npx tsc --noEmit (exit 0) Note: the tests import decentralization-monitor.ts, which in turn imports store.ts at the top level. The store constructor opens a SQLite database on import. If a prior test run left stale WAL lock files on disk, the next run fails with "database is locked". The .gitignore update prevents those lock files from landing in a PR; developers running tests locally may need to rm -f agent-004.db-shm agent-004.db-wal once if they hit the lock. ## Scope Does NOT touch the OODA loops, the Claude narrative layer, the LCD client, the SQLite store, or any output formatting. Tests cover pure functions only. - Lands in: `agent-004-validator-monitor/` - Changes: 33 unit tests + vitest setup + 5 helper exports - Validate: `cd agent-004-validator-monitor && npm test` ## PR relationship Based on PR regen-network#81's branch. If regen-network#81 merges first, this PR rebases cleanly. Sibling PR to regen-network#99 (AGENT-003 unit tests) — the two follow an identical structure and review together better than separately.
There was a problem hiding this comment.
Code Review
This pull request introduces AGENT-004, a validator monitor for the Regen Network that tracks performance, delegation flows, and decentralization using an OODA loop architecture and SQLite storage. Feedback identifies a critical bug in the performance tracking workflow where mismatched key formats prevent the detection of validator downtime. Additionally, the review points out precision loss in Gini index calculations, logic errors in the retrieval of historical decentralization snapshots and commission changes, and an opportunity to improve testability by making the database path configurable.
| const signing = signingByConsAddrLike.get( | ||
| v.consensus_pubkey?.key || "" | ||
| ); |
There was a problem hiding this comment.
The lookup in signingByConsAddrLike will always fail because the keys do not match. info.address (used as the map key in observe) is a bech32-encoded consensus address (e.g., regenvalcons1...), while v.consensus_pubkey?.key is a base64-encoded public key. To correctly link a validator to its signing info, you must derive the consensus address from the public key (SHA256 hash of the pubkey bytes, truncated to 20 bytes, then bech32 encoded). As currently implemented, missedBlocks will always default to 0, rendering the performance monitor unable to detect downtime.
| const total = tokensBig.reduce((acc, t) => acc + t, 0n); | ||
| const sortedDesc = [...tokensBig].sort((a, b) => (a > b ? -1 : a < b ? 1 : 0)); | ||
|
|
||
| const tokensNum = active.map((v) => Number(BigInt(v.tokens || "0") / 1_000_000n)); |
There was a problem hiding this comment.
Dividing by 1_000_000n before converting to Number causes significant precision loss for the Gini index calculation. Validators with stakes less than 1 REGEN (1,000,000 uregen) will be treated as having 0 stake, and differences in fractional REGEN amounts will be ignored. Since JS numbers can safely represent integers up to 2^53 - 1 (approx. 9 quadrillion), and the total REGEN supply is well below this limit even in uregen, you should convert the full uregen amount to Number to maintain precision.
| const tokensNum = active.map((v) => Number(BigInt(v.tokens || "0") / 1_000_000n)); | |
| const tokensNum = active.map((v) => Number(v.tokens || "0")); |
| FROM decentralization_snapshots ORDER BY id DESC LIMIT 1 OFFSET 1` | ||
| ) | ||
| .get() as | ||
| | { |
There was a problem hiding this comment.
Using OFFSET 1 here is incorrect because getLatestDecentralizationSnapshot is called in the orient phase before the current cycle's snapshot is saved in the act phase. Consequently, OFFSET 1 skips the most recent available snapshot (from the previous cycle) and returns the one before it. This results in trend comparisons being shifted by one cycle and failing entirely on the second run of the agent (where only one record exists).
| WHERE operator_address = ? AND captured_at >= ?` | ||
| ) | ||
| .get(operatorAddress, sinceIso) as { cnt: number }; | ||
| return Math.max(0, row.cnt - 1); |
There was a problem hiding this comment.
The cnt - 1 logic for counting commission changes is flawed when the baseline record (the state before the window) falls outside the sinceIso range. If a validator changed their commission once within the last 30 days, but their previous state was recorded 40 days ago, cnt will be 1 and this function will return 0, missing the change. A more robust approach would be to count all records in the window and check if a record exists prior to the window to determine if the first record in the window constitutes a change.
| constructor() { | ||
| this.db = new Database(DB_PATH); | ||
| this.db.pragma("journal_mode = WAL"); |
There was a problem hiding this comment.
The database path is hardcoded in the Store constructor, which makes it difficult to run unit tests without side effects on the local filesystem or hitting database locks (as noted in the PR description). Consider making the database path an optional parameter in the constructor, defaulting to DB_PATH. This would allow tests to use :memory: or a temporary file path, improving test isolation and reliability.
| constructor() { | |
| this.db = new Database(DB_PATH); | |
| this.db.pragma("journal_mode = WAL"); | |
| constructor(dbPath: string = DB_PATH) { | |
| this.db = new Database(dbPath); | |
| this.db.pragma("journal_mode = WAL"); |
Replaces the token-delta MVP proxy with a real staking tx-search client that reads recent MsgDelegate, MsgUndelegate, and MsgBeginRedelegate events from the Cosmos LCD. Closes the follow-up documented in PR regen-network#81's design-decision regen-network#2. ## What changes in the workflow The observe phase no longer snapshots `validator.tokens` or consults the previous snapshot. Instead: const events = await ledger.getRecentDelegationTxs(200); The orient phase aggregates events per-validator via a new pure function `aggregateEventsToFlows`, which handles three rules: - delegate → inflow to event.validator - undelegate → outflow from event.validator - redelegate → outflow from event.sourceValidator, inflow to event.validator (destination) `summarizeFlows` (also new and exported for tests) derives the totals, whale count, top inflow, and top outflow from the flow list. Neither function touches the store — the old per-cycle token snapshot is no longer needed. The monikers for the narrative layer are still backfilled from a single `ledger.getValidators()` call inside the orient phase. ## What changes in the ledger client New methods on LedgerClient: - `getRecentDelegationTxs(limit)` — queries the LCD tx-search endpoint once per staking message type (three type URLs total) and flattens the results into a single DelegationEvent list. Per-type failures are isolated: if the MsgUndelegate query fails for any reason, the MsgDelegate and MsgBeginRedelegate results still come through. - `parseDelegationEventsFromTx(tx)` — public pure function. Walks events at both `logs[].events[]` and top-level `tx.events[]` positions for cross-SDK compatibility. Matches three Cosmos SDK event types: `delegate`, `unbond`, and `redelegate`. Extracts the delegator address from the positionally-corresponding `message` event sender. - New helper `parseCoinAmount(raw)` extracts the numeric prefix from Cosmos coin-amount strings like "1000uregen", returning the numeric part as a string for BigInt-safe downstream consumption. ## New types - `DelegationEvent` — a single on-chain staking event with txHash, eventType, delegator, validator, sourceValidator (only set for redelegate), amountUregen, and occurredAt. The DelegationFlow type is preserved for backward compatibility with the narrative layer. ## New tests — 22 total (55 total across 3 files, up from 33 in regen-network#100) ### src/ledger.test.ts (9 new tests) - empty tx - MsgDelegate extraction with sender from message event - MsgUndelegate extraction via the `unbond` event type - MsgBeginRedelegate with source + destination validators - batched: 3 staking events in one tx with positional sender matching - missing validator attribute ignored - malformed amount attribute ignored - coin-amount format "<uint>uregen" parsed correctly - events read from tx.events[] alongside logs[].events[] ### src/workflows/delegation-flow-analysis.test.ts (13 new tests, on top of the existing 5 for absBig) - aggregateEventsToFlows: - empty - single delegate → inflow - single undelegate → outflow - redelegate → two flows (source + destination) - net delegate + undelegate on same validator - zero net delta skipped - whale threshold tagging - zero/negative amount skipped - non-numeric amount skipped - summarizeFlows: - zero totals on empty - inflow / outflow / net sum correctness - top inflow + top outflow identification - whale count separate from total ## What this unlocks The old MVP proxy couldn't: - Distinguish delegate from undelegate from redelegate. A validator losing 100K stake could be a pure outflow (undelegate) or a redelegate to a different validator, but the proxy saw the same delta number either way. - Attribute flows to a specific delegator address. - Capture intra-cycle movements that net to zero (A delegates, B undelegates same amount within a minute — the old proxy reported "no change" and missed both events). - Produce a reliable audit trail linking back to real tx hashes. The new implementation fixes all four. ## Scope Does NOT touch WF-VM-01 (performance tracking) or WF-VM-03 (decentralization monitor). The bech32 operator→delegator conversion for governance participation scoring is a separate follow-up (the m014 governance score is still MVP-zero after this PR). - Lands in: `agent-004-validator-monitor/` - Changes: new tx-search client + new DelegationEvent type + rewritten WF-VM-02 observe+orient phases + 22 new tests - Validate: `cd agent-004-validator-monitor && npm test && npx tsc --noEmit` ## PR relationship Based on PR regen-network#100 (AGENT-004 unit tests) which is based on PR regen-network#81 (AGENT-004 initial implementation). Sibling to PR regen-network#103 (AGENT-003 MsgRetire tx-stream). The two real-tx-stream PRs close the MVP-proxy column for both market-monitor and validator-monitor in the same session. Refs `phase-2/2.2-agentic-workflows.md` §WF-VM-02 Refs PR regen-network#81's design decision regen-network#2 (MVP token-delta proxy → real tx-stream follow-up)
…T, loop, commission Addresses Gemini review feedback on PR regen-network#81: WF-VM-01 (performance-tracking): * Derive the validator's regenvalcons1… consensus address from its consensus pubkey (SHA256 of pubkey bytes → first 20 bytes → bech32 under the consensus HRP) and use that as the signingByConsAddrLike lookup key. The old code was joining a base64 pubkey string against a bech32 address, so the lookup always missed and every validator reported 0 missed blocks — uptime scored 100% across the board. * Hoist the trailing-window `sinceIso` out of the per-validator loop so we do not recompute the same Date arithmetic N times per cycle. * Drop the dead `operatorToAccountBech32` stub and use the real `operatorToDelegator` helper from the new bech32 module. WF-VM-03 (decentralization-monitor): * Gini index now converts the full uregen value to Number without pre-dividing by 1_000_000n. Integer division floored every validator with less than 1 REGEN to zero and discarded fractional REGEN for larger stakes, both of which distort the Gini. uregen fits inside Number.MAX_SAFE_INTEGER safely, so no precision is lost by keeping the full value. Store: * countCommissionChangesSince now checks whether a baseline row exists *before* the window and counts the in-window rows accordingly. The old `cnt - 1` path dropped a real commission change whenever the baseline read fell outside the window. * getLatestDecentralizationSnapshot uses `LIMIT 1` (no OFFSET). The caller runs in the `orient` phase before the current cycle's snapshot is written, so the newest row is the actual previous cycle's snapshot. `OFFSET 1` was skipping it and either comparing against the cycle before last or returning null on the second run. * Store constructor accepts an optional dbPath (defaulting to the on-disk DB file) so unit tests can pass `:memory:` without clobbering the shared DB or hitting the "database is locked" failure mode. New src/bech32.ts module: * consensusPubkeyToConsAddress, operatorToDelegator, delegatorToOperator built on the `bech32` npm package. Centralized so WF-VM-01 and the forthcoming WF-VM-02 real tx-stream (PR regen-network#104) share one implementation. * Added `bech32: ^2.0.0` dependency to agent-004/package.json. Main loop: * setInterval → recursive setTimeout so a slow cycle cannot overlap with the next tick. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…orkflow fixes Addresses Gemini review feedback on PR regen-network#100: * The Store singleton is now constructed lazily via a Proxy, so importing a workflow file for its exported helpers (which is what decentralization-monitor.test.ts and delegation-flow-analysis.test.ts do) no longer opens the SQLite DB as a side effect. That eliminates the "database is locked" failure mode when the test suites run in parallel. * The Store class already accepts an optional `dbPath` (see the PR regen-network#81 fix that this commit cherry-picks on top) so a future test can construct its own `new Store(":memory:")` without touching the shared DB file. * Also cherry-picks the PR regen-network#81 Gemini-review fixes so this branch is self-consistent: bech32-derived consensus address for uptime lookup, Gini precision via full uregen Number, countCommissionChangesSince baseline-aware logic, OFFSET 1 → OFFSET 0 in getLatestDecentralizationSnapshot, recursive setTimeout main loop. Full vitest suite: 33/33 passing. Typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hey @brawlaphant — auditing the agent-004 stack: #100, #104, and #105 all branch off the same two-commit base (workflow + unit tests) but then diverge in parallel:
Merging any one will conflict the other two. The cleanest path is probably:
Let me know which you'd prefer; we're stalled on this stack until then. |
…orkflow fixes Addresses Gemini review feedback on PR regen-network#100: * The Store singleton is now constructed lazily via a Proxy, so importing a workflow file for its exported helpers (which is what decentralization-monitor.test.ts and delegation-flow-analysis.test.ts do) no longer opens the SQLite DB as a side effect. That eliminates the "database is locked" failure mode when the test suites run in parallel. * The Store class already accepts an optional `dbPath` (see the PR regen-network#81 fix that this commit cherry-picks on top) so a future test can construct its own `new Store(":memory:")` without touching the shared DB file. * Also cherry-picks the PR regen-network#81 Gemini-review fixes so this branch is self-consistent: bech32-derived consensus address for uptime lookup, Gini precision via full uregen Number, countCommissionChangesSince baseline-aware logic, OFFSET 1 → OFFSET 0 in getLatestDecentralizationSnapshot, recursive setTimeout main loop. Full vitest suite: 33/33 passing. Typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Consolidated into #105 per @glandua's audit comment. #105 already contained #104's tx-stream + bech32/M014 governance scoring; the only piece of #100 not in #105 was commit Validation on the consolidated #105:
Closing this in favor of #105. |
Summary
Sibling PR to #99 (AGENT-003 unit tests) and a follow-up to #81 which promised the unit tests as a "separate test-only PR so this PR stays a single-concern 'add the agent' change."
Adds 33 unit tests across 2 test files covering every deterministic helper in the AGENT-004 workflows — Nakamoto coefficient, Gini index, top-N share, health classification, and BigInt absolute value. These helpers are the core of the decentralization analysis surface; if they drift silently, the validator monitor produces misleading alerts or misses real concentration attacks.
Test coverage
`decentralization-monitor.test.ts` — 28 tests
The Nakamoto coefficient test that pins the strict `> threshold` predicate is the single most important test in this PR. A refactor that changes `>` to `>=` would silently flip edge cases — if a validator exactly hits 33.4%, today Nakamoto is 2 (need a second validator to break the strict threshold), but a loose-inequality refactor would return 1 and the network's apparent decentralization would suddenly appear worse without the underlying reality changing.
`delegation-flow-analysis.test.ts` — 5 tests
The beyond-safe-integer test is critical: AGENT-004 operates in uregen, and 221M REGEN is 2.21×10^14 uregen, close enough to the JS safe-integer boundary (≈9×10^15) that the Math.abs-on-Number approach would truncate. Using BigInt and a BigInt-native abs keeps the math exact at mainnet scale.
Helper exports
Five previously module-private helpers are now exported so the test files can import them. No behavior change.
Vitest setup
Same structure as #99:
Note on SQLite locking
The tests import `decentralization-monitor.ts`, which imports `store.ts` at module top, which opens a SQLite database in its constructor. If a prior test run crashed and left stale WAL lock files on disk, the next run fails with "database is locked". The `.gitignore` update prevents those lock files from landing in a future PR. Developers running tests locally may need to `rm -f agent-004.db*` once if they hit the lock.
Test plan
PR relationship
Based on #81's branch (AGENT-004 implementation). If #81 merges first, this PR rebases cleanly. Sibling to #99 (AGENT-003 unit tests).