pods controller restart - error performing anti-entropy sync

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Current Behavior

Sometimes we see a lot of restart in our pod controllers with error `Reason: Error - exit code: 254`
Looking inside controller logs:

`
2026-05-21T10:00:45.722Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/scan/scanner/?recurse= from=127.0.0.1:51938 error="rpc error getting client: failed to get conn: rpc error: lead thread didn't get connection"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1bf3d67]

goroutine 1 [running]:
github.com/neuvector/neuvector/controller/rest.LoadInitCfg(0x0, {0x31bf523, 0xa})
        /src/controller/rest/configmap.go:908 +0x2e7
main.main()
        /src/controller/controller.go:1043 +0x7765
2026-05-21T10:00:45|MON|Process ctrl exit status 2, signal 0, pid=7
2026-05-21T10:00:59.223Z [ERROR] agent.server.raft: failed to get log: index=1 error="log not found"
2026-05-21T10:00:59.820Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled"
Graceful leave complete
2026-05-21T10:01:02|MON|consul lan port is still open, command return = 0
2026-05-21T10:01:02|MON|Start ctrl, pid=131
...
...
...
2026-05-21T10:01:13.342|INFO|CTL|kv.(*clusterHelper).GetInstallationID: installation ID is updated - id=b25946b9df0c4752b4b70039ead92cbe
2026-05-21T10:01:17.429|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:17.429|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=0
2026-05-21T10:01:22.465|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:22.465|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=1
2026-05-21T10:01:27.676|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:27.677|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=2
2026-05-21T10:01:32.78 |ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:32.78 |INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=3
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead check timer expired
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead elected - lead=172.21.6.211:18300
2026-05-21T10:01:33.781|ERRO|CTL|main.main: Failed to read store passphrases - err=Unable to acquire lock after 4s
2026-05-21T10:01:33|MON|Process ctrl exit status 254, signal 0, pid=131
2026-05-21T10:01:33|MON|Process ctrl exit with non-recoverable return code. Monitor Exit!!
Leave the cluster
2026-05-21T10:01:33.977Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="node is not the leader"
Graceful leave complete
2026-05-21T10:01:42|MON|Clean up.`



### Expected Behavior

The election of the leader should be "painless" while here we are also talking about 6 restarts for pod

### Steps To Reproduce

We decide to increase the number of controller because the 3 controllers have some OOM (3G).
When we scaled to 5 controllers we had a lot of pod controllers restarts.

### Environment

```markdown
- NeuVector Version: 5.5.0
- Deployment Method: Helm (neuvector/neuvector-helm)
- Platform: EKS
  - ~13 nodes (aws instances)
  - ~600 pods
  - persistent data via EFS
```

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pods controller restart - error performing anti-entropy sync #576

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pods controller restart - error performing anti-entropy sync #576

Description

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions