[release-3.5] server: cleanup zombie membership information #20995

fuweid · 2025-12-05T16:45:44Z

There are three places where zombie membership information can be cleaned up:

1. ApplierV2 -> UpdateAttributes

   Ideally, each member updates its own attributes only once before serving
   requests. Since this happens only at startup and at low frequency, it is
   safe to clean up zombie members during this operation.

   Additionally, pending logs are always applied before attributes are updated,
   so there is no data race.

2. Applying a snapshot via Raft message

   When a member receives a snapshot through a Raft message, the v2store is
   synchronized with the v3store. This makes it safe to detect and clean up
   zombie membership entries at that time.

3. Applying ConfChange

   Check and cleanup zombie members after ConfChange. Ideally, there aren't
   too many ConfChange, the impact on performance should be minimal.

Fixes: #20967

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

k8s-ci-robot · 2025-12-05T16:45:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fuweid
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

serathius · 2025-12-08T09:09:34Z

tests/e2e/reproduce_20967_util_test.go

+	if arch == "386" {
+		arch = "amd64"
+	}
+	targetURL := fmt.Sprintf("https://github.com/etcd-io/etcd/releases/download/%s/etcd-%s-%s-%s.tar.gz",


Can you use existing mechanism for downloading previous releases?

ahrtr · 2025-12-08T10:21:36Z

tests/e2e/reproduce_20967_test.go

+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/require"


curious why the golangci-lint workflow did not detect this minor issue? cc @ivanvc @joshjms

Suggested change

"github.com/stretchr/testify/require"

"github.com/stretchr/testify/require"

ahrtr · 2025-12-08T10:38:29Z

tests/e2e/reproduce_20967_util_test.go

+// createClustewithZombieMembers creates a cluster with zombie members in v3store.
+//
+// There are two ways to create zombie members:
+//
+// 1. Restoring a cluster using a snapshot taken from version <= v3.4.39.
+//
+//	The `etcdctl snapshot restore` tool in versions <= v3.4.39 contains a bug
+//	that fails to clean up old member information during restore. This leaves
+//	behind stale (zombie) member entries in the v3store.
+//
+// 2. Using ForceNewCluster on a cluster running a version < v3.5.22.
+//
+//	If ETCD commits a ConfChangeAddNode entry to the v3store but crashes before
+//	the corresponding hard state is persisted to the WAL, then after restarting
+//	with ForceNewCluster, the member entries in the v3store are not cleaned up,
+//	resulting in zombie members. This issue was fixed in [v3.5.22][1].
+//
+//	It may be possible to create zombie member in versions < [v3.4.0][2] through a
+//	single-node ForceNewCluster restart. For example, suppose we start with a
+//	one-node cluster. We add a second node, but the second node fails to join
+//	and the cluster loses quorum. We then restart the first node with ForceNewCluster,
+//	which results in a zombie member entry for the second node. However, it
+//	is still unlikely for three zombie members (like [issue-20967][3]) to
+//	appear at the same time.
+//
+//	So in theory, this could happen when using ForceNewCluster. However, it is
+//	extremely unlikely in practice because multiple members are running, and it
+//	would require all of them to panic at the same time. It would also require
+//	selecting one of the crashed members to restart with ForceNewCluster, where
+//	that member had failed to persist the hard state to the WAL—an improbable
+//	combination of events (what a coincidence).
+//
+// So, in this function, we use the first method to create zombie members for
+// testing.
+//
+// REF:
+//
+// [1]: https://github.com/etcd-io/etcd/commit/ccf66456e7237ef5251d9fdc96f1624d1aa4daf1
+// [2]: https://github.com/etcd-io/etcd/issues/14370
+// [3]: https://github.com/etcd-io/etcd/issues/20967


Please update the comment per #20967 (comment)

ahrtr · 2025-12-08T10:47:54Z

server/etcdserver/api/membership/cluster.go

+func CleanupZombieMembersIfNeeded(lg *zap.Logger, be backend.Backend, st v2store.Store) {
+	v2Members, _ := membersFromStore(lg, st)
+	v3Members, _ := membersFromBackend(lg, be)
+
+	for id, v3Member := range v3Members {
+		_, ok := v2Members[id]
+		if !ok {
+			lg.Warn("Removing zombie member from v3store", zap.String("member", fmt.Sprintf("%+v", *v3Member)))
+			if err := unsafeDeleteMemberFromBackend(be, id); err != nil {
+				lg.Warn("failed to delete zombie member from backend", zap.String("member-id", id.String()), zap.Error(err))
+			}
+		}
+	}
+}


I am thinking probably we should sync from v2store to v3store to address the following cases:

zombie members in v3store (exist in v3, but not in v2). It's already reproduced and confirmed.

missing members in v3store (exist in v2, but not in v3). We haven't see any reproduction for such case so far. Since v2store is the only source of truth for membership data for v3.5.x and older versions, so we should completely trust v2store, so probably we should take a proactive measure to sync from v2store to v3store for such case as well. WDYT?

so we should completely trust v2store, so probably we should take a proactive measure to sync from v2store to v3store for such case as well. WDYT?

Agree so we can sync after confChange

ahrtr · 2025-12-08T10:51:06Z

tests/e2e/reproduce_20967_util_test.go

+}
+
+// restoreSnapshotIntoMemberDir restores snapshot into the given member's data dir.
+func restoreSnapshotIntoMemberDir(t *testing.T, etcdctlBin string, snapshotPath string, initialCluster string, proc e2e.EtcdProcess) error {


We should add such function into tests/framework/e2e/etcdctl.go. I won't insist on it since it's stable releases, as we won't add any new features, so I don't expect any major refatoring.

ahrtr · 2025-12-08T10:51:53Z

tests/e2e/reproduce_20967_util_test.go

+}
+
+// downloadReleaseBinary downloads release binary from github release page.
+func downloadReleaseBinary(t *testing.T, ver string, binaryName string) string {


Similar to above comment, this should be part of e2e test framework, but won't insist on it for the same reason above.

Removed this part and use shell script to download binary before test.

ahrtr · 2025-12-08T10:57:50Z

tests/e2e/reproduce_20967_util_test.go

+	epc, err := e2e.NewEtcdProcessCluster(t,
+		&e2e.EtcdProcessClusterConfig{
+			Version:      e2e.LastVersion,
+			ClusterSize:  1,


Can we just create a 3 members cluster and remove the following AddMember logic? It's unnecessarily complicating the test case.

Suggested change

ClusterSize: 1,

ClusterSize: 3,

ahrtr · 2025-12-08T11:22:03Z

tests/e2e/reproduce_20967_test.go

+	t.Log("Rejoining last member with new version...")
+	require.NoError(t, rejoinMemberWithNewVersion(t, epc, 2))


can we remove this to keep the test as simple as possible?

ahrtr · 2025-12-08T12:10:16Z

server/etcdserver/api/membership/cluster.go

+			// Ideally, each member only updates its own attributes
+			// once before serving requests. Since this happens only
+			// on startup and at a low frequency, it should be safe
+			// to clean up zombie members in this request.
+			if id == c.localID && c.v2store != nil {
+				c.lg.Info("checking and cleaning up zombie members after attribute update")
+				CleanupZombieMembersIfNeeded(c.lg, c.be, c.v2store)
+			}


Based on the conversation with the reporter, they had never performed the etcdctl snapshot restore operation when they were on v3.4.x. So there are still some unknown reasons. I think we need to syncMemershipDataFromV2ToV3 after each ConfChange as mentioned in #20967 (comment). Usually there aren't too many ConfChange, so the impact on performance should be minimal.

Signed-off-by: Wei Fu <[email protected]>

There are three places where zombie membership information can be cleaned up: 1. ApplierV2 -> UpdateAttributes Ideally, each member updates its own attributes only once before serving requests. Since this happens only at startup and at low frequency, it is safe to clean up zombie members during this operation. Additionally, pending logs are always applied before attributes are updated, so there is no data race. 2. Applying a snapshot via Raft message When a member receives a snapshot through a Raft message, the v2store is synchronized with the v3store. This makes it safe to detect and clean up zombie membership entries at that time. 3. Applying ConfChange Check and cleanup zombie members after ConfChange. Ideally, there aren't too many ConfChange, the impact on performance should be minimal. Signed-off-by: Wei Fu <[email protected]>

k8s-ci-robot added the do-not-merge/work-in-progress label Dec 5, 2025

fuweid marked this pull request as draft December 5, 2025 16:45

k8s-ci-robot added the size/L label Dec 5, 2025

fuweid mentioned this pull request Dec 5, 2025

Upgrade to etcd 3.6 recovers old/duplicate members from store #20967

Open

4 tasks

fuweid force-pushed the reproduce-20967 branch 2 times, most recently from e47654e to affcd21 Compare December 8, 2025 00:10

fuweid added the ok-to-test label Dec 8, 2025

fuweid force-pushed the reproduce-20967 branch from affcd21 to 359c4eb Compare December 8, 2025 04:43

k8s-ci-robot added size/XL and removed size/L labels Dec 8, 2025

fuweid changed the title ~~[WIP on release-3.5] Reproduce 20967~~ [release-3.5] server: cleanup zombie membership information Dec 8, 2025

fuweid marked this pull request as ready for review December 8, 2025 04:45

k8s-ci-robot added do-not-merge/work-in-progress and removed do-not-merge/work-in-progress labels Dec 8, 2025

serathius reviewed Dec 8, 2025

View reviewed changes

ahrtr reviewed Dec 8, 2025

View reviewed changes

fuweid added 2 commits December 9, 2025 19:05

tests: setup two testcases to reproduce 20967 issue

0e83fdb

Signed-off-by: Wei Fu <[email protected]>

fuweid force-pushed the reproduce-20967 branch from 359c4eb to 8a45a7a Compare December 10, 2025 00:08

k8s-ci-robot added size/L and removed size/XL do-not-merge/work-in-progress labels Dec 10, 2025

	"github.com/stretchr/testify/require"
	"github.com/stretchr/testify/require"

		t.Log("Rejoining last member with new version...")
		require.NoError(t, rejoinMemberWithNewVersion(t, epc, 2))

[release-3.5] server: cleanup zombie membership information #20995

Are you sure you want to change the base?

[release-3.5] server: cleanup zombie membership information #20995

Conversation

fuweid commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

serathius Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

fuweid commented Dec 5, 2025 •

edited

Loading

serathius Dec 8, 2025 •

edited

Loading