fix: re-enable legacy metrics reporter for audit bootstrap#275
Merged
mateeullahmalik merged 1 commit intomasterfrom Mar 10, 2026
Merged
fix: re-enable legacy metrics reporter for audit bootstrap#275mateeullahmalik merged 1 commit intomasterfrom
mateeullahmalik merged 1 commit intomasterfrom
Conversation
The audit module's epoch-end recovery requires peer observations from active probers. When the module was first activated on testnet, all supernodes running v2.4.5-testnet had already been POSTPONED by the legacy staleness handler (they stopped submitting MsgReportSupernodeMetrics ~500 blocks after upgrading, before the chain upgrade). This created a deadlock: - Recovery needs peer observations from active probers - No active probers exist (empty active_supernode_accounts in every anchor) - POSTPONED SNs submit epoch reports but cannot recover - The 3 SNs on old releases bounce ACTIVE↔POSTPONED via legacy metrics but are always POSTPONED at epoch start (anchor freeze time) Fix: run the legacy metrics reporter alongside the audit host_reporter. Legacy MsgReportSupernodeMetrics recovers POSTPONED SNs to ACTIVE mid-epoch. Since they also submit audit epoch reports, the audit EndBlocker won't re-postpone them (report exists, host minimums are disabled, no peer-port streak). They survive the epoch end and appear ACTIVE in the next epoch anchor, bootstrapping the peer-observation cycle for all remaining POSTPONED SNs. Once the active set stabilizes, the legacy reporter can be removed in a future release.
Reviewed the diff and traced the Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
j-rafique
added a commit
to LumeraProtocol/lumera
that referenced
this pull request
Mar 10, 2026
Add unit and system tests that reproduce the testnet deadlock where all supernodes are POSTPONED and the epoch anchor has an empty active set. Without active probers, peer observations cannot be generated, making audit recovery impossible. Tests: - TestEnforceEpochEnd_EmptyActiveSet_PostponedCannotRecover (unit) Proves that compliant host-only reports from POSTPONED SNs are insufficient for recovery when no peer observations exist. - TestEnforceEpochEnd_LegacyRecoveredSN_SurvivesWithReport (unit) Proves that SNs recovered to ACTIVE mid-epoch (via legacy metrics) with audit reports survive the EndBlocker enforcement. - TestAuditEmptyActiveSetDeadlock_HostOnlyReportsCannotRecover (system) Full-chain E2E: registers SNs, misses epoch 0, submits host-only reports for 3 epochs — all remain POSTPONED throughout. - TestAuditEmptyActiveSetBootstrap_LegacyMetricsBreaksDeadlock (system) Full-chain E2E: same deadlock setup, then legacy metrics recovery breaks the deadlock — SNs survive enforcement and remain ACTIVE. Ref: LumeraProtocol/supernode#275
mateeullahmalik
added a commit
that referenced
this pull request
Mar 11, 2026
)" This reverts commit 235d45f.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The audit module's epoch-end recovery for POSTPONED supernodes requires peer observations from active probers. On testnet, a deadlock exists where:
v2.4.5-testnetwere already POSTPONED before the chain upgrade (legacy staleness kicked them ~500 blocks after they stopped submittingMsgReportSupernodeMetrics)active_supernode_accountshas been empty since epoch 1 — no active probers existFix
Re-enable the legacy
supernode_metricsreporter alongside the audithost_reporter. Both run in parallel:MsgReportSupernodeMetrics→ instant recovery from POSTPONED → ACTIVE (mid-epoch)MsgSubmitEpochReport→ prevents re-postponement at epoch end (report exists, host minimums disabled)active_supernode_accounts→ becomes a proberEvidence (testnet
lumera-testnet-2)After stabilization
Once the active set is stable (1-2 epochs), the legacy reporter can be removed in a future release.