Resolve authz ConfigMap for VirtualMCPServer#5290
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
|
Heads up on dropping The four Cedar fields touched here split into two buckets:
So the second bucket went onto The non-breaking alternative was just leaving
The cost is the schema break: anyone setting |
bf2fafb to
7895464
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5290 +/- ##
==========================================
- Coverage 68.41% 68.40% -0.02%
==========================================
Files 620 620
Lines 63316 63401 +85
==========================================
+ Hits 43317 43367 +50
- Misses 16772 16798 +26
- Partials 3227 3236 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7895464 to
f8c1696
Compare
f8c1696 to
12defec
Compare
A `VirtualMCPServer` with `spec.incomingAuth.authzConfig.type: configMap`
silently produced a vmcp `config.yaml` that referenced the unresolved
`configMap` type token. The vmcp binary's `AuthzConfig` validator only
accepts `cedar` or `none`, so the pod crashed in `CrashLoopBackOff` at
startup. Inline authz also silently dropped `GroupClaimName`,
`RoleClaimName`, `GroupEntityType`, and `EntitiesJSON`, so any enterprise
Cedar policy that walked a `Client → ClaimGroup → PlatformRole` hierarchy
denied every request because the runtime Cedar authorizer built
`THVGroup::` parents while the entity store contained `ClaimGroup::`
entities.
Wire the configMap path end-to-end, plumb the four missing fields
through both source paths, and move `PrimaryUpstreamProvider` onto the
auth server config where it belongs:
* Extract `LoadAuthzConfigFromConfigMap` as the shared fetch/parse/
validate helper in `controllerutil`; `AddAuthzConfigOptions` now
delegates to it. The vMCP converter calls the same helper so the
failure modes match the `MCPServer`/`MCPRemoteProxy` runner path.
* Extend `pkg/vmcp/config.AuthzConfig` with `EntitiesJSON`,
`GroupClaimName`, `RoleClaimName`, `GroupEntityType`, and forward
all four into `cedar.ConfigOptions` in the Cedar middleware factory.
`EntitiesJSON` defaults to `"[]"` when unset to preserve the
historical Cedar contract.
* Lift the source-agnostic Cedar JWT-claim mapping fields
(`GroupClaimName`, `RoleClaimName`, `GroupEntityType`) onto
`AuthzConfigRef` so they work identically for inline and configMap
users. For configMap users the parsed payload provides the default
and the spec-level field overrides when set.
* Move `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig`
(`spec.authServerConfig.primaryUpstreamProvider` on
`VirtualMCPServer`). The field describes which upstream IDP token
Cedar reads claims from, which is a property of the embedded auth
server, not of the authz policies. Placing it next to
`upstreamProviders` co-locates the choice with the set it selects
from. Reachable from `MCPExternalAuthConfig` of type
`embeddedAuthServer` as well; on `MCPServer`/`MCPRemoteProxy`
(single-upstream consumers) the only validated values are empty
(auto-select) or the name of that single upstream. The legacy
`spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
location is read as a backward-compatibility fallback for one
release with a `Warning` event of reason
`AuthzPrimaryUpstreamProviderDeprecated`.
* Pre-validate the referenced authz `ConfigMap` in the controller
and distinguish `NotFound` from other parse/validation failures
via two condition reasons on `AuthConfigured`: the existing shared
`AuthzConfigMapNotFound` and a new `AuthzConfigMapInvalid`. This
mirrors the diagnostic `MCPRemoteProxy` emits today and gives
users a status-level error before the converter fails opaquely.
* Validate the resolved `PrimaryUpstreamProvider` (from either
source) against the declared embedded auth server upstreams so an
unresolvable provider is rejected at convert time.
CRD compatibility: not a breaking change. The legacy
`spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
location continues to work, with a Warning event emitted whenever it
is read; planned removal one release after the deprecation cycle.
The new fields on `AuthzConfigRef` (`GroupClaimName`, `RoleClaimName`,
`GroupEntityType`) and on `EmbeddedAuthServerConfig`
(`PrimaryUpstreamProvider`) are additive.
Closes #4919
Closes #5208
Closes #5277
The relocation of `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig` in the parent commit introduced two new behaviours and surfaced one pre-existing coverage gap. None of the three were captured by a test. Add focused unit tests that pin each behaviour so a future refactor cannot silently drop it. * `*VirtualMCPServer.ExplicitPrimaryUpstreamProvider()` is the central precedence helper: canonical `spec.authServerConfig.primaryUpstreamProvider` wins, deprecated `spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider` is the fallback, otherwise empty. A unit test covers all four combinations (canonical only, deprecated only, both, neither) and locks the `fromDeprecated` flag the caller uses to gate the deprecation event. * `AuthzConfigRef.DeprecatedInlinePrimaryUpstreamProvider()` is the helper the advisory paths on `MCPServer` and `MCPRemoteProxy` read. A unit test covers the nil receiver, nil `Inline` subtree, inline without the field, and inline with the field set. * `MCPServer.validateAuthzPrimaryUpstreamProviderIgnored` and the equivalent on `MCPRemoteProxy` set the `AuthzPrimaryUpstreamProviderIgnored` advisory condition when the deprecated field is set on a CR that has no embedded auth server to act on it. Both functions existed before this PR but had no test coverage at all. The relocation of the field did not change their behaviour, but the absence of coverage means a regression there would go unnoticed. Add table-driven tests that exercise the True-on-set, absent-when-unset, and stale-cleared cases. * `VirtualMCPServerReconciler.validateAuthzUpstreamAvailable` emits a `Warning` event with reason `AuthzPrimaryUpstreamProviderDeprecated` whenever it resolves the primary from the deprecated location, so operators see a kubectl-visible signal even when the deprecated value still validates. A test using `events.NewFakeRecorder` confirms the event fires for the deprecated path, does not fire for the canonical path, and does not fire when neither location is set. No production code changes; only tests.
12defec to
0925161
Compare
ChrisJBurns
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: kubernetes-expert, go-security-reviewer, code-reviewer, toolhive-expert, go-expert-developer, documentation-writer
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 1 | New AuthzConfigRef claim-mapping fields dropped on MCPServer/MCPRemoteProxy | 10/10 | HIGH | Fix |
| 2 | virtualmcpserver-api.md still pins primaryUpstreamProvider at deprecated location | 10/10 | HIGH | Fix |
| 3 | virtualmcpserver-kubernetes-guide.md troubleshooting uses deprecated location | 10/10 | HIGH | Fix |
| 4 | EmbeddedAuthServerConfig.PrimaryUpstreamProvider neither validated nor wired on MCPServer/MCPRemoteProxy | 10/10 | MEDIUM | Fix or soften docstring |
| 5 | Comment drift on InlineAuthzConfig.PrimaryUpstreamProvider deprecation | 8/10 | MEDIUM | Fix |
| 6 | AuthzConfigMapInvalid not mirrored on MCPRemoteProxy validateK8sRefs | 8/10 | MEDIUM | Follow-up |
| 7 | No migration guidance for primaryUpstreamProvider move | 8/10 | MEDIUM | Note |
| 8 | Naming collision: ExplicitPrimaryUpstreamProvider on two types, different signatures | 7/10 | MEDIUM | Rename |
| 9 | convertAuthzConfig silently accepts unknown authz types | 7/10 | LOW | Guard |
| 10 | LoadAuthzConfigFromConfigMap dedicated tests missing | 8/10 | LOW | Tests |
| 11 | validateAuthzConfigMapRef doesn't run cedar.ExtractConfig | 9/10 | LOW | Add check |
| 12 | Architecture docs not updated | 8/10 | LOW | Optional |
| 13 | AuthzConfigMapInvalid undocumented in troubleshooting | 7/10 | LOW | Note |
| 14 | Deprecation Warning event fires every reconcile | 7/10 | LOW | Optional |
| 15 | Deprecation Warning event not emitted in no-auth-server branch | 7/10 | LOW | Emit |
Overall
The PR fixes two real bugs — the vmcp CrashLoopBackOff on type: configMap, and the EntitiesJSON="[]" silent-deny on Cedar policies that walk Client → ClaimGroup → PlatformRole. The vMCP-side wiring is clean: a shared LoadAuthzConfigFromConfigMap, distinct condition reasons (AuthzConfigMapNotFound vs new AuthzConfigMapInvalid), defense-in-depth at convert time, and thorough table-driven tests that lock the deprecation precedence and the event emission contract.
The findings cluster around one structural concern: the new fields are added to shared types (AuthzConfigRef for the JWT-claim mapping; EmbeddedAuthServerConfig for PrimaryUpstreamProvider) but the runtime wiring and the user-facing docs only land on the VirtualMCPServer consumer. MCPServer and MCPRemoteProxy users who follow the PR's own deprecation notes will hit silent no-ops. None of this blocks the vMCP fix shipping, but the field additions to AuthzConfigRef are particularly worth resolving before merge — either by threading them through addAuthzInlineConfigOptions and the configMap branch of AddAuthzConfigOptions, or by tightening the docstrings to say "VirtualMCPServer-only" with an explicit advisory condition on the other CRDs.
The remaining items are smaller: a few low-severity test gaps, a default: arm missing on convertAuthzConfig, the docs sync for the user-facing guides, and a couple of minor consistency nits in the validator.
Documentation
docs/operator/virtualmcpserver-api.md:90still listsprimaryUpstreamProvideronly atauthzConfig.inline.primaryUpstreamProvider. The canonical location (spec.authServerConfig.primaryUpstreamProvider) and the newgroupClaimName/roleClaimName/groupEntityTypefields onAuthzConfigRefare absent. [F2]docs/operator/virtualmcpserver-kubernetes-guide.md:631-637troubleshooting block pins primary upstream at the deprecated inline location; following the guide will produceAuthzPrimaryUpstreamProviderDeprecatedevents on every reconcile. [F3]- No CHANGELOG / RELEASE-NOTES / migration doc captures the deprecation. Consider a short "Migration: primaryUpstreamProvider location" subsection in the kubernetes guide. The Warning event message text is currently the only guidance and it's buried in
kubectl get events. [F7] docs/arch/10-virtual-mcp-architecture.md(or09-operator-architecture.md) doesn't describe the new sharedLoadAuthzConfigFromConfigMapresolver, the spec-vs-ConfigMap precedence for claim-mapping fields, or thePrimaryUpstreamProviderrelocation. Optional follow-up. [F12]- The
AuthConfigured=Falsereasons (AuthzConfigMapNotFound,AuthzConfigMapInvalid) aren't called out in the "Authorization Policy Errors" troubleshooting block of the kubernetes guide. [F13]
Generated with Claude Code
| @@ -213,59 +282,14 @@ func AddAuthzConfigOptions( | |||
| return addAuthzInlineConfigOptions(authzRef, options) | |||
|
|
|||
| case mcpv1beta1.AuthzConfigTypeConfigMap: | |||
There was a problem hiding this comment.
[F1 HIGH — consensus 10/10] New AuthzConfigRef claim-mapping fields silently dropped on MCPServer/MCPRemoteProxy runner path.
The PR adds GroupClaimName, RoleClaimName, and GroupEntityType to the shared AuthzConfigRef type (used by all three CRDs) and the CRD docstring explicitly promises spec-over-ConfigMap override semantics. The vMCP converter (convertAuthzConfig) threads them through to cedar.ConfigOptions. But neither addAuthzInlineConfigOptions (line 169, the inline runner path) nor this configMap branch applies them — the loaded *authz.Config is passed straight to runner.WithAuthzConfig. Setting any of these fields on an MCPServer or MCPRemoteProxy is a silent no-op at runtime, which contradicts both the field documentation and the "no silent no-ops" rule in .claude/rules/go-style.md.
Fix: thread the three fields into the cedar.ConfigOptions built by addAuthzInlineConfigOptions; in this configMap branch, after LoadAuthzConfigFromConfigMap, apply the spec-wins-over-ConfigMap override the vMCP converter performs (mutating a copy of cfg). Add a unit test asserting the values flow into the built runner.WithAuthzConfig argument — there is currently zero coverage of the new fields on the MCPServer/MCPRemoteProxy code paths.
Raised by code-reviewer, toolhive-expert, go-security-reviewer.
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| // +kubebuilder:validation:Pattern=`^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` | ||
| PrimaryUpstreamProvider string `json:"primaryUpstreamProvider,omitempty"` |
There was a problem hiding this comment.
[F4 MEDIUM — consensus 10/10] This field is unreachable on the MCPServer/MCPRemoteProxy path.
The docstring claims "on MCPServer and MCPRemoteProxy (single-upstream consumers) the only validated values are empty or the name of the sole upstream" — but no validator in mcpserver_controller.go, mcpremoteproxy_controller.go, or pkg/controllerutil/ references EmbeddedAuthServerConfig.PrimaryUpstreamProvider, and the field is never threaded into the proxyrunner configuration. The structural advisory AuthzPrimaryUpstreamProviderIgnored only inspects the legacy InlineAuthzConfig.PrimaryUpstreamProvider. Result: users following the deprecation guidance "move the value to embeddedAuthServer.primaryUpstreamProvider" get no error, no warning, and no effect.
Fix: either (a) wire the new field through the MCPServer/MCPRemoteProxy → MCPExternalAuthConfig consumer and emit AuthzPrimaryUpstreamProviderIgnored when set on those CRDs, or (b) soften this docstring to say the field is structurally meaningful only on VirtualMCPServer until further notice.
Raised by kubernetes-expert, toolhive-expert.
| // | ||
| // Deprecated: this field has moved to | ||
| // spec.authServerConfig.primaryUpstreamProvider on VirtualMCPServer (or to | ||
| // the embeddedAuthServer.primaryUpstreamProvider field of a referenced |
There was a problem hiding this comment.
[F5 MEDIUM — consensus 8/10] The deprecation block self-contradicts.
The first half points users to "the embeddedAuthServer.primaryUpstreamProvider field of a referenced MCPExternalAuthConfig for MCPServer and MCPRemoteProxy". The second half then says "the field remains structurally meaningless (no embedded auth server runtime) and continues to surface the AuthzPrimaryUpstreamProviderIgnored advisory". Both can't be true — either there's a canonical replacement on MCPServer/MCPRemoteProxy or there isn't. Per .claude/rules/go-style.md "Keep Comments Synchronized With Code".
Fix: pick one stance. If MCPServer/MCPRemoteProxy have no canonical home today (matches the actual wiring — see F4), reword to: "this field has always been a structural no-op on MCPServer and MCPRemoteProxy (no embedded auth server). The deprecation only applies to VirtualMCPServer, where it has moved to spec.authServerConfig.primaryUpstreamProvider."
Raised by code-reviewer.
There was a problem hiding this comment.
I read the two sentences as describing different fields rather than contradicting each other:
- Sentence 1 ("this field has moved to …
embeddedAuthServer.primaryUpstreamProvider… for MCPServer and MCPRemoteProxy") names the new canonical location. - Sentence 2 ("On MCPServer and MCPRemoteProxy the field remains structurally meaningless") describes this very field — the deprecated
InlineAuthzConfig.PrimaryUpstreamProvider.
Both can be true simultaneously: the new location is the canonical replacement, and the deprecated form on this type was always a no-op on MCPServer/MCPRemoteProxy.
The real issue underneath this is your F4: the docstring promises a canonical replacement on those CR types that isn't wired. If F4 is resolved by wiring the field, this docstring is accurate as-written; if F4 is resolved by softening, the same softening probably needs to apply here. I'd treat this as subsumed by F4.
| // Precedence: the canonical location wins if set; the deprecated location is | ||
| // read only as a backward-compatibility fallback. Callers should emit a | ||
| // Warning event when fromDeprecated is true. | ||
| func (r *VirtualMCPServer) ExplicitPrimaryUpstreamProvider() (name string, fromDeprecated bool) { |
There was a problem hiding this comment.
[F8 MEDIUM — consensus 7/10] ExplicitPrimaryUpstreamProvider now exists on two related types with different signatures.
AuthzConfigRef.ExplicitPrimaryUpstreamProvider() was renamed to DeprecatedInlinePrimaryUpstreamProvider(), and this new method takes its name with a different return shape ((name string, fromDeprecated bool)). A reader grepping for ExplicitPrimaryUpstreamProvider will hit two definitions and can't tell at a glance which receiver/return is in play — exactly the kind of API confusion that bites navigation.
Fix: either rename this to ResolveExplicitPrimaryUpstreamProvider() or PrimaryUpstreamProviderWithSource() to signal the precedence-resolution semantic, or fold the deprecated-fallback logic into a single method AuthzConfigRef.PrimaryUpstreamProvider() (name string, fromDeprecated bool) and have VirtualMCPServer compose the two locations.
Raised by code-reviewer.
There was a problem hiding this comment.
I don't think the rename creates a collision — it removes one:
(*VirtualMCPServer).ExplicitPrimaryUpstreamProvider() (string, bool)is the only definition ofExplicitPrimaryUpstreamProvider.(*AuthzConfigRef).DeprecatedInlinePrimaryUpstreamProvider() stringis the only definition ofDeprecatedInlinePrimaryUpstreamProvider.
A grep for either name lands on exactly one definition; the receiver disambiguates at every call site. The pre-rename world was ambiguous (ExplicitPrimaryUpstreamProvider on two receivers), but the rename is precisely the fix. I'd leave this as-is.
| // condition; the caller is responsible for persisting status. | ||
| func (*MCPRemoteProxyReconciler) validateAuthzPrimaryUpstreamProviderIgnored(proxy *mcpv1beta1.MCPRemoteProxy) { | ||
| provider := proxy.Spec.AuthzConfig.ExplicitPrimaryUpstreamProvider() | ||
| provider := proxy.Spec.AuthzConfig.DeprecatedInlinePrimaryUpstreamProvider() |
There was a problem hiding this comment.
[F6 MEDIUM (parity) — consensus 8/10] MCPRemoteProxy lacks the new AuthzConfigMapInvalid reason.
MCPRemoteProxyReconciler.validateK8sRefs still routes every ConfigMap failure (NotFound, missing key, malformed payload, Validate-fail) through ConditionReasonAuthzConfigMapNotFound. After this PR, VirtualMCPServer distinguishes NotFound from Invalid by calling the shared LoadAuthzConfigFromConfigMap. A malformed but present authz ConfigMap on MCPRemoteProxy still leads to pod CrashLoopBackOff with no actionable status — the exact failure mode this PR is fixing on VMCP. Per .claude/rules/operator.md "Status Condition Parity".
Fix: in a follow-up PR (acceptable as a tracking issue linked from here), switch validateK8sRefs to use LoadAuthzConfigFromConfigMap and add ConditionReasonAuthzConfigMapInvalid in mcpremoteproxy_types.go. The shared loader makes this mechanical.
Raised by kubernetes-expert.
| authzType := authzRef.Type | ||
| if authzType == authzLabelValueInline || authzType == mcpv1beta1.AuthzConfigTypeConfigMap { | ||
| authzType = "cedar" | ||
| } |
There was a problem hiding this comment.
[F9 LOW — consensus 7/10] No default: arm on the switch — unknown authzRef.Type silently produces a Cedar config with zero policies.
The CRD enum (Enum: [configMap inline]) blocks unknown types at admission today, but this converter alone has no defense. A CLI dry-run, a webhook running against a stale schema, or a future authz type would walk past the inline/configMap cases with authz.Type = "cedar" and empty Policies. Cedar then denies every request silently.
Fix: add a default: arm that returns fmt.Errorf("unsupported authz config type: %q", authzRef.Type) — mirrors the equivalent guard in AddAuthzConfigOptions (controllerutil/authz.go line 295). The path-mapping comment two lines up ("other types pass through (e.g. 'configMap' → 'cedar' once we populate policies)") also no longer matches the code and should be tightened or dropped.
Raised by go-expert-developer.
| // | ||
| // The returned *authz.Config is safe to embed directly into RunConfig (via | ||
| // runner.WithAuthzConfig) or to read field-by-field for the vMCP converter. | ||
| func LoadAuthzConfigFromConfigMap( |
There was a problem hiding this comment.
[F10 LOW — consensus 8/10] No dedicated unit tests for LoadAuthzConfigFromConfigMap.
The new helper has six distinct failure modes (nil ref, wrong type, missing name, nil client, NotFound, missing key, empty value, malformed payload, Validate() failure). The converter test (converter_authz_configmap_test.go) covers the apiserver-side errors (NotFound, missing key, empty, malformed) but not (a) nil/wrong-type authzRef, (b) nil client, or (c) the highest-value case — a payload that parses successfully but fails cfg.Validate() (e.g. zero policies). The latter is the only thing protecting Cedar from a permit-nothing config.
Fix: add a focused TestLoadAuthzConfigFromConfigMap in cmd/thv-operator/pkg/controllerutil/authz_test.go with table entries for each failure mode and at least one "valid JSON, fails Validate" case.
Raised by go-expert-developer.
There was a problem hiding this comment.
The failure modes are covered, just indirectly via TestAddAuthzConfigOptions (which is LoadAuthzConfigFromConfigMap's sole public consumer):
- nil / wrong-type
authzRef→authz_test.go:329("Nil authz ref returns nil"),:367("Inline type with nil inline config returns error") - nil client →
:431("ConfigMap type with nil client returns error") - valid JSON that fails
cfg.Validate()→:542("ConfigMap type with invalid config returns error") - plus NotFound, missing key, empty value, custom key (lines 453, 476, 509, 616)
Every failure mode is exercised — just not under a TestLoadAuthzConfigFromConfigMap name. A direct test would be nice for explicit coverage but doesn't add new behavioral coverage. I'd treat this as nice-to-have rather than required.
| // rather than as a generic conversion failure. Inline and absent authzConfig are no-ops. | ||
| // | ||
| // Mirrors the pattern in mcpremoteproxy_controller.go's validateK8sRefs. | ||
| func (r *VirtualMCPServerReconciler) validateAuthzConfigMapRef( |
There was a problem hiding this comment.
[F11 LOW — consensus 9/10] validateAuthzConfigMapRef discards the loaded config and bypasses cedar.ExtractConfig.
The pre-validator calls LoadAuthzConfigFromConfigMap and ignores the return value. The converter (convertAuthzConfig) then re-loads the same ConfigMap and additionally runs cedar.ExtractConfig on it. A payload that parses as a valid authz.Config but isn't Cedar-flavored (wrong type field, or a future HTTP authorizer) will pass this pre-validator → set AuthConfigured=True → fail at convert time with a different error. The two should agree on what "valid" means.
Fix: either (a) add a cedar.ExtractConfig(loaded) smoke check here, or (b) refactor LoadAuthzConfigFromConfigMap to assert Cedar-ness so both sites use the same definition. Add a test case for "valid authz.Config but type != cedarv1" returning AuthzConfigMapInvalid.
Raised by code-reviewer, kubernetes-expert.
| @@ -623,14 +623,13 @@ func (*VirtualMCPServerReconciler) validateAuthzUpstreamAvailable( | |||
| // admission for the same "fail loudly instead of denying every request" | |||
| // reason as the configured-AS mismatch path below. | |||
| if vmcp.Spec.AuthServerConfig == nil { | |||
There was a problem hiding this comment.
[F15 LOW — consensus 7/10] Deprecation Warning event not emitted in the no-auth-server branch.
When vmcp.Spec.AuthServerConfig == nil, the validator calls vmcp.ExplicitPrimaryUpstreamProvider() and discards the fromDeprecated flag (explicitProvider, _ := ...). Lower at line 674, when AuthServerConfig != nil, the same call emits AuthzPrimaryUpstreamProviderDeprecated whenever fromDeprecated is true. So a user mid-migration (deprecated location set, auth server not yet configured) gets the hard reject but never sees the deprecation hint about the new field location.
Fix: emit the same AuthzPrimaryUpstreamProviderDeprecated Warning event in this branch before calling rejectAuthzAdmission, or extract a small helper that both branches call.
Raised by toolhive-expert.
| // mismatch would cause Cedar to deny every request at runtime — fail loudly | ||
| // at admission instead. | ||
| explicitProvider, fromDeprecated := vmcp.ExplicitPrimaryUpstreamProvider() | ||
| if explicitProvider != "" && fromDeprecated && r.Recorder != nil { |
There was a problem hiding this comment.
[F14 LOW — consensus 7/10] Deprecation Warning event fires on every reconcile cycle while the deprecated field is set.
Kubernetes event aggregation will dedupe within 10 minutes, so the cardinality is bounded — but VMCP reconciles can be frequent (watch resync, dependent resource changes), so this will fire often during the deprecation window. The advisory condition AuthzPrimaryUpstreamProviderIgnored already uses the better pattern (metav1.Condition with LastTransitionTime → emit event on transition only).
Fix (optional): gate this Eventf on an AuthzPrimaryUpstreamProviderDeprecated status condition so the event fires only on first observation per generation. Acceptable to defer if the team is comfortable with event-aggregation dedup as the only throttle.
Raised by kubernetes-expert.
There was a problem hiding this comment.
I think this is OK as-is for now. Chris's own comment acknowledges that Event aggregation dedupes within ~10 minutes, so practical cardinality is bounded.
The cleaner fix — gate emission on the False→True transition of an AuthzPrimaryUpstreamProviderDeprecated condition — requires introducing that condition, which is a design decision (do we want a Deprecated condition mirroring the existing Ignored one?). If the team adds that condition later, this throttling falls out naturally.
I'd dismiss this as a nit until/unless we decide on the broader Deprecated-condition design.
jhrozek
left a comment
There was a problem hiding this comment.
One non-blocking nit from cross-checking the review pass.
| return nil, fmt.Errorf( | ||
| "authz primaryUpstreamProvider %q does not match any upstream declared "+ | ||
| "on spec.authServerConfig.upstreamProviders", explicit) | ||
| cedarCfg, err := cedar.ExtractConfig(loaded) |
There was a problem hiding this comment.
Nit / follow-up suggestion — not blocking.
The converter imports pkg/authz/authorizers/cedar solely to call ExtractConfig here. The runner-path consumer of the same shared loader (AddAuthzConfigOptions in controllerutil/authz.go) stays authorizer-agnostic and forwards *authz.Config opaquely. Adding a future authorizer means editing the converter; the runner path doesn't have that pressure.
Consider extracting a sibling helper in controllerutil — e.g. ExtractCedarAuthzOptions(*authz.Config) (*cedar.ConfigOptions, error) — so the converter imports only controllerutil, not cedar. Eliminates the layering smell and keeps Cedar-specific glue in one package.
Also relates to F1: when the spec-over-ConfigMap precedence logic is added to controllerutil/authz.go, it'd be a natural fit to land alongside the extracted helper so both code paths share precedence resolution.
|
Nit / drift-prevention observation — not blocking, doesn't anchor on a changed line so filing as a PR comment instead of inline.
Suggested options:
Filing as a regular PR comment because line 99 is outside the diff hunks of this PR. |
A
VirtualMCPServerwithspec.incomingAuth.authzConfig.type: configMapsilently produced a vmcpconfig.yamlthat referenced the unresolvedconfigMaptype token. The vmcp binary'sAuthzConfigvalidator only acceptscedarornone, so the pod crashed inCrashLoopBackOffat startup. Inline authz also silently droppedGroupClaimName,RoleClaimName,GroupEntityType, andEntitiesJSON, so any enterprise Cedar policy that walked aClient → ClaimGroup → PlatformRolehierarchy denied every request because the runtime Cedar authorizer builtTHVGroup::parents while the entity store containedClaimGroup::entities.Wire the configMap path end-to-end, plumb the four missing fields through both source paths, and move
PrimaryUpstreamProvideronto the auth server config where it belongs:Extract
LoadAuthzConfigFromConfigMapas the shared fetch/parse/ validate helper incontrollerutil;AddAuthzConfigOptionsnow delegates to it. The vMCP converter calls the same helper so the failure modes match theMCPServer/MCPRemoteProxyrunner path.Extend
pkg/vmcp/config.AuthzConfigwithEntitiesJSON,GroupClaimName,RoleClaimName,GroupEntityType, and forward all four intocedar.ConfigOptionsin the Cedar middleware factory.EntitiesJSONdefaults to"[]"when unset to preserve the historical Cedar contract.Lift the source-agnostic Cedar JWT-claim mapping fields (
GroupClaimName,RoleClaimName,GroupEntityType) ontoAuthzConfigRefso they work identically for inline and configMap users. For configMap users the parsed payload provides the default and the spec-level field overrides when set.Move
PrimaryUpstreamProviderontoEmbeddedAuthServerConfig(spec.authServerConfig.primaryUpstreamProvideronVirtualMCPServer). The field describes which upstream IDP token Cedar reads claims from, which is a property of the embedded auth server, not of the authz policies. Placing it next toupstreamProvidersco-locates the choice with the set it selects from. Reachable fromMCPExternalAuthConfigof typeembeddedAuthServeras well; onMCPServer/MCPRemoteProxy(single-upstream consumers) the only validated values are empty (auto-select) or the name of that single upstream. The legacyspec.incomingAuth.authzConfig.inline.primaryUpstreamProviderlocation is read as a backward-compatibility fallback for one release with aWarningevent of reasonAuthzPrimaryUpstreamProviderDeprecated.Pre-validate the referenced authz
ConfigMapin the controller and distinguishNotFoundfrom other parse/validation failures via two condition reasons onAuthConfigured: the existing sharedAuthzConfigMapNotFoundand a newAuthzConfigMapInvalid. This mirrors the diagnosticMCPRemoteProxyemits today and gives users a status-level error before the converter fails opaquely.Validate the resolved
PrimaryUpstreamProvider(from either source) against the declared embedded auth server upstreams so an unresolvable provider is rejected at convert time.CRD compatibility: not a breaking change. The legacy
spec.incomingAuth.authzConfig.inline.primaryUpstreamProviderlocation continues to work, with a Warning event emitted whenever it is read; planned removal one release after the deprecation cycle. The new fields onAuthzConfigRef(GroupClaimName,RoleClaimName,GroupEntityType) and onEmbeddedAuthServerConfig(PrimaryUpstreamProvider) are additive.Closes #4919
Closes #5208
Closes #5277
Large PR Justification
~950 lines of code are relevant, everything else is auto-generated.