Fix ValidatorSetFromProto crash on nil proposer in stored state#3102
Draft
Fix ValidatorSetFromProto crash on nil proposer in stored state#3102
Conversation
When a ValidatorSet is serialized via ToProto() with an empty/nil set, it produces a proto with no Proposer field. On deserialization, ValidatorSetFromProto() unconditionally tried to deserialize the Proposer, which fails with 'nil validator' when GetProposer() returns nil. This roundtrip asymmetry causes nodes to fail startup with: error creating node: fromProto: validatorSet proposer error: nil validator Fix: when the proto has no Proposer but has validators, derive the proposer using findProposer() (same logic as GetProposer() fallback). When both validators and proposer are absent, let ValidateBasic() return the appropriate empty set error. Co-authored-by: Masih H. Derkani <m@derkani.org>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3102 +/- ##
=======================================
Coverage 58.58% 58.59%
=======================================
Files 2096 2096
Lines 173413 173418 +5
=======================================
+ Hits 101601 101616 +15
+ Misses 62757 62748 -9
+ Partials 9055 9054 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
- ValidatorSet.ToProto(): return nil instead of &tmproto.ValidatorSet{}
for nil/empty sets. The old empty-but-non-nil proto had a nil Proposer
field that ValidatorSetFromProto could not decode, creating a
serialization roundtrip that silently succeeds on write but crashes on
read.
- LightBlock.ToProto(): return ErrValidatorSetEmpty when a present
ValidatorSet serializes to nil, catching semantically invalid states
at serialization time rather than letting them propagate.
- Add TestValidatorSetFromProto_NilProposerWithValidators: directly
constructs a proto with valid validators but nil Proposer and verifies
the proposer is correctly derived via findProposer().
- Add TestValidatorSetFromProto_EmptyProtoRoundtrip: verifies ToProto
on an empty set returns nil and FromProto(nil) returns an error.
Co-authored-by: Masih H. Derkani <m@derkani.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes and provide context
Fixes the node startup crash:
Root cause
There is a serialization/deserialization asymmetry between
ValidatorSet.ToProto()andValidatorSetFromProto():ToProto()returned&tmproto.ValidatorSet{}(non-nil, empty proto with nil Proposer) when the validator set was nil or emptyValidatorSetFromProto()checks if the proto is nil, but did not handle a non-nil proto with a nil Proposer field — it unconditionally calledValidatorFromProto(vp.GetProposer()), which fails whenGetProposer()returns nilThis means any state saved with a nil/empty ValidatorSet (e.g.
LastValidatorsduring bootstrap, state sync edge cases, or proto version mismatches across binary upgrades) produced a proto that could never be loaded back, preventing the node from starting.Fix (two sides)
1. Root cause —
ValidatorSet.ToProto(): Returnnilinstead of&tmproto.ValidatorSet{}for nil/empty sets. A nil proto field is correctly handled everywhere: proto serialization omits it, andValidatorSetFromProto(nil)returns a clear "nil validator set" error. The old empty-but-non-nil proto silently succeeded on write but crashed on read.2. Backward compat —
ValidatorSetFromProto(): Handle nil Proposer gracefully for already-persisted data. If the proto has validators but no Proposer, derive the proposer usingfindProposer()(same fallback already used byGetProposer()). If both are absent,ValidateBasic()returnsErrValidatorSetEmpty.3. Propagation —
LightBlock.ToProto(): ReturnErrValidatorSetEmptywhen a present-but-empty ValidatorSet serializes to nil, catching semantically invalid light blocks at serialization time.Testing performed to validate your change
types,state,light,statesyncpackages)TestValidatorSetFromProto_NilProposerWithValidators: constructs a proto with valid validators but nil Proposer, verifies the proposer is correctly derived and is a member of the setTestValidatorSetFromProto_EmptyProtoRoundtrip: verifiesToProtoon empty set returns nil andFromProto(nil)returns errorTestLightBlockProtobufto expectToProtoerror on empty validator sets (previously only caught atFromPrototime)gofmt -s -lproduces no output