Is there an existing issue for this?
Current Behavior
Sometimes we see a lot of restart in our pod controllers with error Reason: Error - exit code: 254
Looking inside controller logs:
`
2026-05-21T10:00:45.722Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/scan/scanner/?recurse= from=127.0.0.1:51938 error="rpc error getting client: failed to get conn: rpc error: lead thread didn't get connection"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1bf3d67]
goroutine 1 [running]:
github.com/neuvector/neuvector/controller/rest.LoadInitCfg(0x0, {0x31bf523, 0xa})
/src/controller/rest/configmap.go:908 +0x2e7
main.main()
/src/controller/controller.go:1043 +0x7765
2026-05-21T10:00:45|MON|Process ctrl exit status 2, signal 0, pid=7
2026-05-21T10:00:59.223Z [ERROR] agent.server.raft: failed to get log: index=1 error="log not found"
2026-05-21T10:00:59.820Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled"
Graceful leave complete
2026-05-21T10:01:02|MON|consul lan port is still open, command return = 0
2026-05-21T10:01:02|MON|Start ctrl, pid=131
...
...
...
2026-05-21T10:01:13.342|INFO|CTL|kv.(*clusterHelper).GetInstallationID: installation ID is updated - id=b25946b9df0c4752b4b70039ead92cbe
2026-05-21T10:01:17.429|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:17.429|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=0
2026-05-21T10:01:22.465|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:22.465|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=1
2026-05-21T10:01:27.676|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:27.677|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=2
2026-05-21T10:01:32.78 |ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:32.78 |INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=3
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead check timer expired
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead elected - lead=172.21.6.211:18300
2026-05-21T10:01:33.781|ERRO|CTL|main.main: Failed to read store passphrases - err=Unable to acquire lock after 4s
2026-05-21T10:01:33|MON|Process ctrl exit status 254, signal 0, pid=131
2026-05-21T10:01:33|MON|Process ctrl exit with non-recoverable return code. Monitor Exit!!
Leave the cluster
2026-05-21T10:01:33.977Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="node is not the leader"
Graceful leave complete
2026-05-21T10:01:42|MON|Clean up.`
Expected Behavior
The election of the leader should be "painless" while here we are also talking about 6 restarts for pod
Steps To Reproduce
We decide to increase the number of controller because the 3 controllers have some OOM (3G).
When we scaled to 5 controllers we had a lot of pod controllers restarts.
Environment
- NeuVector Version: 5.5.0
- Deployment Method: Helm (neuvector/neuvector-helm)
- Platform: EKS
- ~13 nodes (aws instances)
- ~600 pods
- persistent data via EFS
Anything else?
No response
Is there an existing issue for this?
Current Behavior
Sometimes we see a lot of restart in our pod controllers with error
Reason: Error - exit code: 254Looking inside controller logs:
`
2026-05-21T10:00:45.722Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/scan/scanner/?recurse= from=127.0.0.1:51938 error="rpc error getting client: failed to get conn: rpc error: lead thread didn't get connection"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1bf3d67]
goroutine 1 [running]:
github.com/neuvector/neuvector/controller/rest.LoadInitCfg(0x0, {0x31bf523, 0xa})
/src/controller/rest/configmap.go:908 +0x2e7
main.main()
/src/controller/controller.go:1043 +0x7765
2026-05-21T10:00:45|MON|Process ctrl exit status 2, signal 0, pid=7
2026-05-21T10:00:59.223Z [ERROR] agent.server.raft: failed to get log: index=1 error="log not found"
2026-05-21T10:00:59.820Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled"
Graceful leave complete
2026-05-21T10:01:02|MON|consul lan port is still open, command return = 0
2026-05-21T10:01:02|MON|Start ctrl, pid=131
...
...
...
2026-05-21T10:01:13.342|INFO|CTL|kv.(*clusterHelper).GetInstallationID: installation ID is updated - id=b25946b9df0c4752b4b70039ead92cbe
2026-05-21T10:01:17.429|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:17.429|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=0
2026-05-21T10:01:22.465|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:22.465|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=1
2026-05-21T10:01:27.676|ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:27.677|INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=2
2026-05-21T10:01:32.78 |ERRO|CTL|kv.clusterHelper.AcquireLock: Acquire lock error: Unable to acquire lock after 4s - key=lock/store_secret
2026-05-21T10:01:32.78 |INFO|CTL|main.main: retry for store passphrase - err=Unable to acquire lock after 4s i=3
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead check timer expired
2026-05-21T10:01:33.297|INFO|CTL|cluster.StartCluster.func2: Lead elected - lead=172.21.6.211:18300
2026-05-21T10:01:33.781|ERRO|CTL|main.main: Failed to read store passphrases - err=Unable to acquire lock after 4s
2026-05-21T10:01:33|MON|Process ctrl exit status 254, signal 0, pid=131
2026-05-21T10:01:33|MON|Process ctrl exit with non-recoverable return code. Monitor Exit!!
Leave the cluster
2026-05-21T10:01:33.977Z [ERROR] agent.server: error performing anti-entropy sync of federation state: error="node is not the leader"
Graceful leave complete
2026-05-21T10:01:42|MON|Clean up.`
Expected Behavior
The election of the leader should be "painless" while here we are also talking about 6 restarts for pod
Steps To Reproduce
We decide to increase the number of controller because the 3 controllers have some OOM (3G).
When we scaled to 5 controllers we had a lot of pod controllers restarts.
Environment
Anything else?
No response