Skip to content

[Bug] MCPRegistry stuck in RBACFailed on operator 0.28.0 — registry-api Role still grants core "events" verb that ClusterRole no longer holds #5339

@danbarr

Description

@danbarr

Summary

After upgrading to toolhive-operator v0.28.0, all MCPRegistry resources are stuck in phase: Failed with reason: RBACFailed. The operator cannot create the per-registry Role because it attempts to grant ""/events: [create, patch] — a permission the operator's own ClusterRole no longer holds after #5243 moved its events grant to the events.k8s.io API group. Kubernetes RBAC escalation prevention rejects the upsert.


Environment

Component Version
toolhive-operator Helm chart 0.28.0
toolhive-operator-crds Helm chart 0.28.0
Operator app version 0.28.0
Kubernetes server v1.33.x (kind)

Steps to Reproduce

  1. Install CRDs + operator from chart 0.28.0.
  2. Apply any MCPRegistry resource.
  3. Observe:
$ kubectl get mcpregistry -A
NAMESPACE         NAME                STATUS   READY   AGE
toolhive-system   toolhive-registry   Failed   False   2m
  1. Status condition:
Failed to ensure RBAC resources: failed to ensure role:
failed to upsert role toolhive-registry-registry-api in namespace toolhive-system:
roles.rbac.authorization.k8s.io "toolhive-registry-registry-api" is forbidden:
user "system:serviceaccount:toolhive-system:toolhive-operator" ...
is attempting to grant RBAC permissions not currently held:
{APIGroups:[""], Resources:["events"], Verbs:["create" "patch"]}

Root Cause

#5243 ("Fix operator RBAC for event recording") moved the operator's own events grant from the core API group to events.k8s.io to match the controller-runtime upgrade in #4183:

# deploy/charts/operator/templates/clusterrole/role.yaml (post-#5243)
- apiGroups: [events.k8s.io]      # was ""
  resources: [events]
  verbs: [create, patch]

But the registry-api Role builder still requests the core API group:

cmd/thv-operator/pkg/registryapi/rbac.go:56-61

// Event creation for leader election status
{
    APIGroups: []string{""},
    Resources: []string{"events"},
    Verbs:     []string{"create", "patch"},
},

Because the operator SA can no longer create/patch events in "", K8s' RBAC escalation prevention forbids it from creating a Role that grants those verbs to a downstream SA.

auth can-i confirms:

$ kubectl auth can-i create events --as=system:serviceaccount:toolhive-system:toolhive-operator -n toolhive-system
no

Workaround

Add the core-group rule back to the operator's ClusterRole post-install:

kubectl patch clusterrole toolhive-operator-manager-role --type=json \
  -p='[{"op":"add","path":"/rules/-","value":{"apiGroups":[""],"resources":["events"],"verbs":["create","patch"]}}]'

Overwritten on helm upgrade.


Expected Fix

Either change the registry-api Role to use events.k8s.io (matching the operator's own permissions and the modern events API), or re-add ""/events to the operator's ClusterRole template alongside events.k8s.io/events. The first option is consistent with #5243's direction.

The same ""/events rule appears to be present for the other workload controllers too (mcpserver, mcpremoteproxy, virtualmcpserver, embeddingserver) — worth auditing those Role builders for the same drift.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggoPull requests that update go codekubernetesItems related to Kubernetesoperator
No fields configured for Bug 🐞.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions