Skip to content

Resolve authz ConfigMap for VirtualMCPServer#5290

Open
blkt wants to merge 2 commits into
mainfrom
feat/vmcp-resolve-authz-configmap
Open

Resolve authz ConfigMap for VirtualMCPServer#5290
blkt wants to merge 2 commits into
mainfrom
feat/vmcp-resolve-authz-configmap

Conversation

@blkt
Copy link
Copy Markdown
Contributor

@blkt blkt commented May 15, 2026

A VirtualMCPServer with spec.incomingAuth.authzConfig.type: configMap silently produced a vmcp config.yaml that referenced the unresolved configMap type token. The vmcp binary's AuthzConfig validator only accepts cedar or none, so the pod crashed in CrashLoopBackOff at startup. Inline authz also silently dropped GroupClaimName, RoleClaimName, GroupEntityType, and EntitiesJSON, so any enterprise Cedar policy that walked a Client → ClaimGroup → PlatformRole hierarchy denied every request because the runtime Cedar authorizer built THVGroup:: parents while the entity store contained ClaimGroup:: entities.

Wire the configMap path end-to-end, plumb the four missing fields through both source paths, and move PrimaryUpstreamProvider onto the auth server config where it belongs:

  • Extract LoadAuthzConfigFromConfigMap as the shared fetch/parse/ validate helper in controllerutil; AddAuthzConfigOptions now delegates to it. The vMCP converter calls the same helper so the failure modes match the MCPServer/MCPRemoteProxy runner path.

  • Extend pkg/vmcp/config.AuthzConfig with EntitiesJSON, GroupClaimName, RoleClaimName, GroupEntityType, and forward all four into cedar.ConfigOptions in the Cedar middleware factory. EntitiesJSON defaults to "[]" when unset to preserve the historical Cedar contract.

  • Lift the source-agnostic Cedar JWT-claim mapping fields (GroupClaimName, RoleClaimName, GroupEntityType) onto AuthzConfigRef so they work identically for inline and configMap users. For configMap users the parsed payload provides the default and the spec-level field overrides when set.

  • Move PrimaryUpstreamProvider onto EmbeddedAuthServerConfig (spec.authServerConfig.primaryUpstreamProvider on VirtualMCPServer). The field describes which upstream IDP token Cedar reads claims from, which is a property of the embedded auth server, not of the authz policies. Placing it next to upstreamProviders co-locates the choice with the set it selects from. Reachable from MCPExternalAuthConfig of type embeddedAuthServer as well; on MCPServer/MCPRemoteProxy (single-upstream consumers) the only validated values are empty (auto-select) or the name of that single upstream. The legacy spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider location is read as a backward-compatibility fallback for one release with a Warning event of reason AuthzPrimaryUpstreamProviderDeprecated.

  • Pre-validate the referenced authz ConfigMap in the controller and distinguish NotFound from other parse/validation failures via two condition reasons on AuthConfigured: the existing shared AuthzConfigMapNotFound and a new AuthzConfigMapInvalid. This mirrors the diagnostic MCPRemoteProxy emits today and gives users a status-level error before the converter fails opaquely.

  • Validate the resolved PrimaryUpstreamProvider (from either source) against the declared embedded auth server upstreams so an unresolvable provider is rejected at convert time.

CRD compatibility: not a breaking change. The legacy spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider location continues to work, with a Warning event emitted whenever it is read; planned removal one release after the deprecation cycle. The new fields on AuthzConfigRef (GroupClaimName, RoleClaimName, GroupEntityType) and on EmbeddedAuthServerConfig (PrimaryUpstreamProvider) are additive.

Closes #4919
Closes #5208
Closes #5277

Large PR Justification

~950 lines of code are relevant, everything else is auto-generated.

@blkt blkt self-assigned this May 15, 2026
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 15, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@github-actions github-actions Bot dismissed their stale review May 15, 2026 08:58

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@blkt
Copy link
Copy Markdown
Contributor Author

blkt commented May 15, 2026

Heads up on dropping InlineAuthzConfig.PrimaryUpstreamProvider. Flagging because it's a breaking change.

The four Cedar fields touched here split into two buckets:

  • Policy content (Policies, EntitiesJSON`): lives wherever the policies live.
  • JWT-claim mapping (PrimaryUpstreamProvider, GroupClaimName, RoleClaimName, GroupEntityType): describes how Cedar reads the token, a separate concern from the policies themselves.

So the second bucket went onto AuthzConfigRef directly. Same field, same semantics, regardless of whether you're inline or configMap.

The non-breaking alternative was just leaving PrimaryUpstreamProvider on InlineAuthzConfig. Two reasons I didn't:

  • configMap users would still have no spec-level knob; they'd have to mutate the ConfigMap, and in our enterprise flow that's owned by an external controller.
  • Two YAML paths for the same value, plus ExplicitPrimaryUpstreamProvider() keeps an inline-only branch that's easy to miss.

The cost is the schema break: anyone setting spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider today needs to move it up one level. If that feels too aggressive, happy to do a deprecation window instead (keep the inline field for a release, fall through in the helper, remove next minor). Small isolated change.

@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from bf2fafb to 7895464 Compare May 15, 2026 09:37
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 92.94872% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.40%. Comparing base (0a741f7) to head (0925161).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/controllerutil/authz.go 85.29% 3 Missing and 2 partials ⚠️
...perator/controllers/virtualmcpserver_controller.go 90.90% 2 Missing and 1 partial ⚠️
cmd/thv-operator/pkg/vmcpconfig/converter.go 95.23% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5290      +/-   ##
==========================================
- Coverage   68.41%   68.40%   -0.02%     
==========================================
  Files         620      620              
  Lines       63316    63401      +85     
==========================================
+ Hits        43317    43367      +50     
- Misses      16772    16798      +26     
- Partials     3227     3236       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from 7895464 to f8c1696 Compare May 15, 2026 11:40
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 15, 2026
@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from f8c1696 to 12defec Compare May 19, 2026 10:45
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 19, 2026
A `VirtualMCPServer` with `spec.incomingAuth.authzConfig.type: configMap`
silently produced a vmcp `config.yaml` that referenced the unresolved
`configMap` type token. The vmcp binary's `AuthzConfig` validator only
accepts `cedar` or `none`, so the pod crashed in `CrashLoopBackOff` at
startup. Inline authz also silently dropped `GroupClaimName`,
`RoleClaimName`, `GroupEntityType`, and `EntitiesJSON`, so any enterprise
Cedar policy that walked a `Client → ClaimGroup → PlatformRole` hierarchy
denied every request because the runtime Cedar authorizer built
`THVGroup::` parents while the entity store contained `ClaimGroup::`
entities.

Wire the configMap path end-to-end, plumb the four missing fields
through both source paths, and move `PrimaryUpstreamProvider` onto the
auth server config where it belongs:

  * Extract `LoadAuthzConfigFromConfigMap` as the shared fetch/parse/
    validate helper in `controllerutil`; `AddAuthzConfigOptions` now
    delegates to it. The vMCP converter calls the same helper so the
    failure modes match the `MCPServer`/`MCPRemoteProxy` runner path.

  * Extend `pkg/vmcp/config.AuthzConfig` with `EntitiesJSON`,
    `GroupClaimName`, `RoleClaimName`, `GroupEntityType`, and forward
    all four into `cedar.ConfigOptions` in the Cedar middleware factory.
    `EntitiesJSON` defaults to `"[]"` when unset to preserve the
    historical Cedar contract.

  * Lift the source-agnostic Cedar JWT-claim mapping fields
    (`GroupClaimName`, `RoleClaimName`, `GroupEntityType`) onto
    `AuthzConfigRef` so they work identically for inline and configMap
    users. For configMap users the parsed payload provides the default
    and the spec-level field overrides when set.

  * Move `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig`
    (`spec.authServerConfig.primaryUpstreamProvider` on
    `VirtualMCPServer`). The field describes which upstream IDP token
    Cedar reads claims from, which is a property of the embedded auth
    server, not of the authz policies. Placing it next to
    `upstreamProviders` co-locates the choice with the set it selects
    from. Reachable from `MCPExternalAuthConfig` of type
    `embeddedAuthServer` as well; on `MCPServer`/`MCPRemoteProxy`
    (single-upstream consumers) the only validated values are empty
    (auto-select) or the name of that single upstream. The legacy
    `spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
    location is read as a backward-compatibility fallback for one
    release with a `Warning` event of reason
    `AuthzPrimaryUpstreamProviderDeprecated`.

  * Pre-validate the referenced authz `ConfigMap` in the controller
    and distinguish `NotFound` from other parse/validation failures
    via two condition reasons on `AuthConfigured`: the existing shared
    `AuthzConfigMapNotFound` and a new `AuthzConfigMapInvalid`. This
    mirrors the diagnostic `MCPRemoteProxy` emits today and gives
    users a status-level error before the converter fails opaquely.

  * Validate the resolved `PrimaryUpstreamProvider` (from either
    source) against the declared embedded auth server upstreams so an
    unresolvable provider is rejected at convert time.

CRD compatibility: not a breaking change. The legacy
`spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
location continues to work, with a Warning event emitted whenever it
is read; planned removal one release after the deprecation cycle.
The new fields on `AuthzConfigRef` (`GroupClaimName`, `RoleClaimName`,
`GroupEntityType`) and on `EmbeddedAuthServerConfig`
(`PrimaryUpstreamProvider`) are additive.

Closes #4919
Closes #5208
Closes #5277
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 19, 2026
The relocation of `PrimaryUpstreamProvider` onto `EmbeddedAuthServerConfig`
in the parent commit introduced two new behaviours and surfaced one
pre-existing coverage gap. None of the three were captured by a test. Add
focused unit tests that pin each behaviour so a future refactor cannot
silently drop it.

* `*VirtualMCPServer.ExplicitPrimaryUpstreamProvider()` is the central
  precedence helper: canonical `spec.authServerConfig.primaryUpstreamProvider`
  wins, deprecated `spec.incomingAuth.authzConfig.inline.primaryUpstreamProvider`
  is the fallback, otherwise empty. A unit test covers all four
  combinations (canonical only, deprecated only, both, neither) and locks
  the `fromDeprecated` flag the caller uses to gate the deprecation event.

* `AuthzConfigRef.DeprecatedInlinePrimaryUpstreamProvider()` is the helper
  the advisory paths on `MCPServer` and `MCPRemoteProxy` read. A unit
  test covers the nil receiver, nil `Inline` subtree, inline without the
  field, and inline with the field set.

* `MCPServer.validateAuthzPrimaryUpstreamProviderIgnored` and the
  equivalent on `MCPRemoteProxy` set the `AuthzPrimaryUpstreamProviderIgnored`
  advisory condition when the deprecated field is set on a CR that has
  no embedded auth server to act on it. Both functions existed before
  this PR but had no test coverage at all. The relocation of the field
  did not change their behaviour, but the absence of coverage means a
  regression there would go unnoticed. Add table-driven tests that
  exercise the True-on-set, absent-when-unset, and stale-cleared cases.

* `VirtualMCPServerReconciler.validateAuthzUpstreamAvailable` emits a
  `Warning` event with reason `AuthzPrimaryUpstreamProviderDeprecated`
  whenever it resolves the primary from the deprecated location, so
  operators see a kubectl-visible signal even when the deprecated value
  still validates. A test using `events.NewFakeRecorder` confirms the
  event fires for the deprecated path, does not fire for the canonical
  path, and does not fire when neither location is set.

No production code changes; only tests.
@blkt blkt force-pushed the feat/vmcp-resolve-authz-configmap branch from 12defec to 0925161 Compare May 19, 2026 12:54
@github-actions github-actions Bot removed the size/XL Extra large PR: 1000+ lines changed label May 19, 2026
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 19, 2026
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: kubernetes-expert, go-security-reviewer, code-reviewer, toolhive-expert, go-expert-developer, documentation-writer

Consensus Summary

# Finding Consensus Severity Action
1 New AuthzConfigRef claim-mapping fields dropped on MCPServer/MCPRemoteProxy 10/10 HIGH Fix
2 virtualmcpserver-api.md still pins primaryUpstreamProvider at deprecated location 10/10 HIGH Fix
3 virtualmcpserver-kubernetes-guide.md troubleshooting uses deprecated location 10/10 HIGH Fix
4 EmbeddedAuthServerConfig.PrimaryUpstreamProvider neither validated nor wired on MCPServer/MCPRemoteProxy 10/10 MEDIUM Fix or soften docstring
5 Comment drift on InlineAuthzConfig.PrimaryUpstreamProvider deprecation 8/10 MEDIUM Fix
6 AuthzConfigMapInvalid not mirrored on MCPRemoteProxy validateK8sRefs 8/10 MEDIUM Follow-up
7 No migration guidance for primaryUpstreamProvider move 8/10 MEDIUM Note
8 Naming collision: ExplicitPrimaryUpstreamProvider on two types, different signatures 7/10 MEDIUM Rename
9 convertAuthzConfig silently accepts unknown authz types 7/10 LOW Guard
10 LoadAuthzConfigFromConfigMap dedicated tests missing 8/10 LOW Tests
11 validateAuthzConfigMapRef doesn't run cedar.ExtractConfig 9/10 LOW Add check
12 Architecture docs not updated 8/10 LOW Optional
13 AuthzConfigMapInvalid undocumented in troubleshooting 7/10 LOW Note
14 Deprecation Warning event fires every reconcile 7/10 LOW Optional
15 Deprecation Warning event not emitted in no-auth-server branch 7/10 LOW Emit

Overall

The PR fixes two real bugs — the vmcp CrashLoopBackOff on type: configMap, and the EntitiesJSON="[]" silent-deny on Cedar policies that walk Client → ClaimGroup → PlatformRole. The vMCP-side wiring is clean: a shared LoadAuthzConfigFromConfigMap, distinct condition reasons (AuthzConfigMapNotFound vs new AuthzConfigMapInvalid), defense-in-depth at convert time, and thorough table-driven tests that lock the deprecation precedence and the event emission contract.

The findings cluster around one structural concern: the new fields are added to shared types (AuthzConfigRef for the JWT-claim mapping; EmbeddedAuthServerConfig for PrimaryUpstreamProvider) but the runtime wiring and the user-facing docs only land on the VirtualMCPServer consumer. MCPServer and MCPRemoteProxy users who follow the PR's own deprecation notes will hit silent no-ops. None of this blocks the vMCP fix shipping, but the field additions to AuthzConfigRef are particularly worth resolving before merge — either by threading them through addAuthzInlineConfigOptions and the configMap branch of AddAuthzConfigOptions, or by tightening the docstrings to say "VirtualMCPServer-only" with an explicit advisory condition on the other CRDs.

The remaining items are smaller: a few low-severity test gaps, a default: arm missing on convertAuthzConfig, the docs sync for the user-facing guides, and a couple of minor consistency nits in the validator.

Documentation

  • docs/operator/virtualmcpserver-api.md:90 still lists primaryUpstreamProvider only at authzConfig.inline.primaryUpstreamProvider. The canonical location (spec.authServerConfig.primaryUpstreamProvider) and the new groupClaimName / roleClaimName / groupEntityType fields on AuthzConfigRef are absent. [F2]
  • docs/operator/virtualmcpserver-kubernetes-guide.md:631-637 troubleshooting block pins primary upstream at the deprecated inline location; following the guide will produce AuthzPrimaryUpstreamProviderDeprecated events on every reconcile. [F3]
  • No CHANGELOG / RELEASE-NOTES / migration doc captures the deprecation. Consider a short "Migration: primaryUpstreamProvider location" subsection in the kubernetes guide. The Warning event message text is currently the only guidance and it's buried in kubectl get events. [F7]
  • docs/arch/10-virtual-mcp-architecture.md (or 09-operator-architecture.md) doesn't describe the new shared LoadAuthzConfigFromConfigMap resolver, the spec-vs-ConfigMap precedence for claim-mapping fields, or the PrimaryUpstreamProvider relocation. Optional follow-up. [F12]
  • The AuthConfigured=False reasons (AuthzConfigMapNotFound, AuthzConfigMapInvalid) aren't called out in the "Authorization Policy Errors" troubleshooting block of the kubernetes guide. [F13]

Generated with Claude Code

@@ -213,59 +282,14 @@ func AddAuthzConfigOptions(
return addAuthzInlineConfigOptions(authzRef, options)

case mcpv1beta1.AuthzConfigTypeConfigMap:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F1 HIGH — consensus 10/10] New AuthzConfigRef claim-mapping fields silently dropped on MCPServer/MCPRemoteProxy runner path.

The PR adds GroupClaimName, RoleClaimName, and GroupEntityType to the shared AuthzConfigRef type (used by all three CRDs) and the CRD docstring explicitly promises spec-over-ConfigMap override semantics. The vMCP converter (convertAuthzConfig) threads them through to cedar.ConfigOptions. But neither addAuthzInlineConfigOptions (line 169, the inline runner path) nor this configMap branch applies them — the loaded *authz.Config is passed straight to runner.WithAuthzConfig. Setting any of these fields on an MCPServer or MCPRemoteProxy is a silent no-op at runtime, which contradicts both the field documentation and the "no silent no-ops" rule in .claude/rules/go-style.md.

Fix: thread the three fields into the cedar.ConfigOptions built by addAuthzInlineConfigOptions; in this configMap branch, after LoadAuthzConfigFromConfigMap, apply the spec-wins-over-ConfigMap override the vMCP converter performs (mutating a copy of cfg). Add a unit test asserting the values flow into the built runner.WithAuthzConfig argument — there is currently zero coverage of the new fields on the MCPServer/MCPRemoteProxy code paths.

Raised by code-reviewer, toolhive-expert, go-security-reviewer.

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
PrimaryUpstreamProvider string `json:"primaryUpstreamProvider,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F4 MEDIUM — consensus 10/10] This field is unreachable on the MCPServer/MCPRemoteProxy path.

The docstring claims "on MCPServer and MCPRemoteProxy (single-upstream consumers) the only validated values are empty or the name of the sole upstream" — but no validator in mcpserver_controller.go, mcpremoteproxy_controller.go, or pkg/controllerutil/ references EmbeddedAuthServerConfig.PrimaryUpstreamProvider, and the field is never threaded into the proxyrunner configuration. The structural advisory AuthzPrimaryUpstreamProviderIgnored only inspects the legacy InlineAuthzConfig.PrimaryUpstreamProvider. Result: users following the deprecation guidance "move the value to embeddedAuthServer.primaryUpstreamProvider" get no error, no warning, and no effect.

Fix: either (a) wire the new field through the MCPServer/MCPRemoteProxy → MCPExternalAuthConfig consumer and emit AuthzPrimaryUpstreamProviderIgnored when set on those CRDs, or (b) soften this docstring to say the field is structurally meaningful only on VirtualMCPServer until further notice.

Raised by kubernetes-expert, toolhive-expert.

//
// Deprecated: this field has moved to
// spec.authServerConfig.primaryUpstreamProvider on VirtualMCPServer (or to
// the embeddedAuthServer.primaryUpstreamProvider field of a referenced
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F5 MEDIUM — consensus 8/10] The deprecation block self-contradicts.

The first half points users to "the embeddedAuthServer.primaryUpstreamProvider field of a referenced MCPExternalAuthConfig for MCPServer and MCPRemoteProxy". The second half then says "the field remains structurally meaningless (no embedded auth server runtime) and continues to surface the AuthzPrimaryUpstreamProviderIgnored advisory". Both can't be true — either there's a canonical replacement on MCPServer/MCPRemoteProxy or there isn't. Per .claude/rules/go-style.md "Keep Comments Synchronized With Code".

Fix: pick one stance. If MCPServer/MCPRemoteProxy have no canonical home today (matches the actual wiring — see F4), reword to: "this field has always been a structural no-op on MCPServer and MCPRemoteProxy (no embedded auth server). The deprecation only applies to VirtualMCPServer, where it has moved to spec.authServerConfig.primaryUpstreamProvider."

Raised by code-reviewer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the two sentences as describing different fields rather than contradicting each other:

  • Sentence 1 ("this field has moved to … embeddedAuthServer.primaryUpstreamProvider … for MCPServer and MCPRemoteProxy") names the new canonical location.
  • Sentence 2 ("On MCPServer and MCPRemoteProxy the field remains structurally meaningless") describes this very field — the deprecated InlineAuthzConfig.PrimaryUpstreamProvider.

Both can be true simultaneously: the new location is the canonical replacement, and the deprecated form on this type was always a no-op on MCPServer/MCPRemoteProxy.

The real issue underneath this is your F4: the docstring promises a canonical replacement on those CR types that isn't wired. If F4 is resolved by wiring the field, this docstring is accurate as-written; if F4 is resolved by softening, the same softening probably needs to apply here. I'd treat this as subsumed by F4.

// Precedence: the canonical location wins if set; the deprecated location is
// read only as a backward-compatibility fallback. Callers should emit a
// Warning event when fromDeprecated is true.
func (r *VirtualMCPServer) ExplicitPrimaryUpstreamProvider() (name string, fromDeprecated bool) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F8 MEDIUM — consensus 7/10] ExplicitPrimaryUpstreamProvider now exists on two related types with different signatures.

AuthzConfigRef.ExplicitPrimaryUpstreamProvider() was renamed to DeprecatedInlinePrimaryUpstreamProvider(), and this new method takes its name with a different return shape ((name string, fromDeprecated bool)). A reader grepping for ExplicitPrimaryUpstreamProvider will hit two definitions and can't tell at a glance which receiver/return is in play — exactly the kind of API confusion that bites navigation.

Fix: either rename this to ResolveExplicitPrimaryUpstreamProvider() or PrimaryUpstreamProviderWithSource() to signal the precedence-resolution semantic, or fold the deprecated-fallback logic into a single method AuthzConfigRef.PrimaryUpstreamProvider() (name string, fromDeprecated bool) and have VirtualMCPServer compose the two locations.

Raised by code-reviewer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the rename creates a collision — it removes one:

  • (*VirtualMCPServer).ExplicitPrimaryUpstreamProvider() (string, bool) is the only definition of ExplicitPrimaryUpstreamProvider.
  • (*AuthzConfigRef).DeprecatedInlinePrimaryUpstreamProvider() string is the only definition of DeprecatedInlinePrimaryUpstreamProvider.

A grep for either name lands on exactly one definition; the receiver disambiguates at every call site. The pre-rename world was ambiguous (ExplicitPrimaryUpstreamProvider on two receivers), but the rename is precisely the fix. I'd leave this as-is.

// condition; the caller is responsible for persisting status.
func (*MCPRemoteProxyReconciler) validateAuthzPrimaryUpstreamProviderIgnored(proxy *mcpv1beta1.MCPRemoteProxy) {
provider := proxy.Spec.AuthzConfig.ExplicitPrimaryUpstreamProvider()
provider := proxy.Spec.AuthzConfig.DeprecatedInlinePrimaryUpstreamProvider()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F6 MEDIUM (parity) — consensus 8/10] MCPRemoteProxy lacks the new AuthzConfigMapInvalid reason.

MCPRemoteProxyReconciler.validateK8sRefs still routes every ConfigMap failure (NotFound, missing key, malformed payload, Validate-fail) through ConditionReasonAuthzConfigMapNotFound. After this PR, VirtualMCPServer distinguishes NotFound from Invalid by calling the shared LoadAuthzConfigFromConfigMap. A malformed but present authz ConfigMap on MCPRemoteProxy still leads to pod CrashLoopBackOff with no actionable status — the exact failure mode this PR is fixing on VMCP. Per .claude/rules/operator.md "Status Condition Parity".

Fix: in a follow-up PR (acceptable as a tracking issue linked from here), switch validateK8sRefs to use LoadAuthzConfigFromConfigMap and add ConditionReasonAuthzConfigMapInvalid in mcpremoteproxy_types.go. The shared loader makes this mechanical.

Raised by kubernetes-expert.

authzType := authzRef.Type
if authzType == authzLabelValueInline || authzType == mcpv1beta1.AuthzConfigTypeConfigMap {
authzType = "cedar"
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F9 LOW — consensus 7/10] No default: arm on the switch — unknown authzRef.Type silently produces a Cedar config with zero policies.

The CRD enum (Enum: [configMap inline]) blocks unknown types at admission today, but this converter alone has no defense. A CLI dry-run, a webhook running against a stale schema, or a future authz type would walk past the inline/configMap cases with authz.Type = "cedar" and empty Policies. Cedar then denies every request silently.

Fix: add a default: arm that returns fmt.Errorf("unsupported authz config type: %q", authzRef.Type) — mirrors the equivalent guard in AddAuthzConfigOptions (controllerutil/authz.go line 295). The path-mapping comment two lines up ("other types pass through (e.g. 'configMap' → 'cedar' once we populate policies)") also no longer matches the code and should be tightened or dropped.

Raised by go-expert-developer.

//
// The returned *authz.Config is safe to embed directly into RunConfig (via
// runner.WithAuthzConfig) or to read field-by-field for the vMCP converter.
func LoadAuthzConfigFromConfigMap(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F10 LOW — consensus 8/10] No dedicated unit tests for LoadAuthzConfigFromConfigMap.

The new helper has six distinct failure modes (nil ref, wrong type, missing name, nil client, NotFound, missing key, empty value, malformed payload, Validate() failure). The converter test (converter_authz_configmap_test.go) covers the apiserver-side errors (NotFound, missing key, empty, malformed) but not (a) nil/wrong-type authzRef, (b) nil client, or (c) the highest-value case — a payload that parses successfully but fails cfg.Validate() (e.g. zero policies). The latter is the only thing protecting Cedar from a permit-nothing config.

Fix: add a focused TestLoadAuthzConfigFromConfigMap in cmd/thv-operator/pkg/controllerutil/authz_test.go with table entries for each failure mode and at least one "valid JSON, fails Validate" case.

Raised by go-expert-developer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure modes are covered, just indirectly via TestAddAuthzConfigOptions (which is LoadAuthzConfigFromConfigMap's sole public consumer):

  • nil / wrong-type authzRefauthz_test.go:329 ("Nil authz ref returns nil"), :367 ("Inline type with nil inline config returns error")
  • nil client → :431 ("ConfigMap type with nil client returns error")
  • valid JSON that fails cfg.Validate():542 ("ConfigMap type with invalid config returns error")
  • plus NotFound, missing key, empty value, custom key (lines 453, 476, 509, 616)

Every failure mode is exercised — just not under a TestLoadAuthzConfigFromConfigMap name. A direct test would be nice for explicit coverage but doesn't add new behavioral coverage. I'd treat this as nice-to-have rather than required.

// rather than as a generic conversion failure. Inline and absent authzConfig are no-ops.
//
// Mirrors the pattern in mcpremoteproxy_controller.go's validateK8sRefs.
func (r *VirtualMCPServerReconciler) validateAuthzConfigMapRef(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F11 LOW — consensus 9/10] validateAuthzConfigMapRef discards the loaded config and bypasses cedar.ExtractConfig.

The pre-validator calls LoadAuthzConfigFromConfigMap and ignores the return value. The converter (convertAuthzConfig) then re-loads the same ConfigMap and additionally runs cedar.ExtractConfig on it. A payload that parses as a valid authz.Config but isn't Cedar-flavored (wrong type field, or a future HTTP authorizer) will pass this pre-validator → set AuthConfigured=True → fail at convert time with a different error. The two should agree on what "valid" means.

Fix: either (a) add a cedar.ExtractConfig(loaded) smoke check here, or (b) refactor LoadAuthzConfigFromConfigMap to assert Cedar-ness so both sites use the same definition. Add a test case for "valid authz.Config but type != cedarv1" returning AuthzConfigMapInvalid.

Raised by code-reviewer, kubernetes-expert.

@@ -623,14 +623,13 @@ func (*VirtualMCPServerReconciler) validateAuthzUpstreamAvailable(
// admission for the same "fail loudly instead of denying every request"
// reason as the configured-AS mismatch path below.
if vmcp.Spec.AuthServerConfig == nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F15 LOW — consensus 7/10] Deprecation Warning event not emitted in the no-auth-server branch.

When vmcp.Spec.AuthServerConfig == nil, the validator calls vmcp.ExplicitPrimaryUpstreamProvider() and discards the fromDeprecated flag (explicitProvider, _ := ...). Lower at line 674, when AuthServerConfig != nil, the same call emits AuthzPrimaryUpstreamProviderDeprecated whenever fromDeprecated is true. So a user mid-migration (deprecated location set, auth server not yet configured) gets the hard reject but never sees the deprecation hint about the new field location.

Fix: emit the same AuthzPrimaryUpstreamProviderDeprecated Warning event in this branch before calling rejectAuthzAdmission, or extract a small helper that both branches call.

Raised by toolhive-expert.

// mismatch would cause Cedar to deny every request at runtime — fail loudly
// at admission instead.
explicitProvider, fromDeprecated := vmcp.ExplicitPrimaryUpstreamProvider()
if explicitProvider != "" && fromDeprecated && r.Recorder != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F14 LOW — consensus 7/10] Deprecation Warning event fires on every reconcile cycle while the deprecated field is set.

Kubernetes event aggregation will dedupe within 10 minutes, so the cardinality is bounded — but VMCP reconciles can be frequent (watch resync, dependent resource changes), so this will fire often during the deprecation window. The advisory condition AuthzPrimaryUpstreamProviderIgnored already uses the better pattern (metav1.Condition with LastTransitionTime → emit event on transition only).

Fix (optional): gate this Eventf on an AuthzPrimaryUpstreamProviderDeprecated status condition so the event fires only on first observation per generation. Acceptable to defer if the team is comfortable with event-aggregation dedup as the only throttle.

Raised by kubernetes-expert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK as-is for now. Chris's own comment acknowledges that Event aggregation dedupes within ~10 minutes, so practical cardinality is bounded.

The cleaner fix — gate emission on the False→True transition of an AuthzPrimaryUpstreamProviderDeprecated condition — requires introducing that condition, which is a design decision (do we want a Deprecated condition mirroring the existing Ignored one?). If the team adds that condition later, this throttling falls out naturally.

I'd dismiss this as a nit until/unless we decide on the broader Deprecated-condition design.

Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking nit from cross-checking the review pass.

return nil, fmt.Errorf(
"authz primaryUpstreamProvider %q does not match any upstream declared "+
"on spec.authServerConfig.upstreamProviders", explicit)
cedarCfg, err := cedar.ExtractConfig(loaded)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / follow-up suggestion — not blocking.

The converter imports pkg/authz/authorizers/cedar solely to call ExtractConfig here. The runner-path consumer of the same shared loader (AddAuthzConfigOptions in controllerutil/authz.go) stays authorizer-agnostic and forwards *authz.Config opaquely. Adding a future authorizer means editing the converter; the runner path doesn't have that pressure.

Consider extracting a sibling helper in controllerutil — e.g. ExtractCedarAuthzOptions(*authz.Config) (*cedar.ConfigOptions, error) — so the converter imports only controllerutil, not cedar. Eliminates the layering smell and keeps Cedar-specific glue in one package.

Also relates to F1: when the spec-over-ConfigMap precedence logic is added to controllerutil/authz.go, it'd be a natural fit to land alongside the extracted helper so both code paths share precedence resolution.

@jhrozek
Copy link
Copy Markdown
Contributor

jhrozek commented May 19, 2026

Nit / drift-prevention observation — not blocking, doesn't anchor on a changed line so filing as a PR comment instead of inline.

ConditionTypeAuthzPrimaryUpstreamProviderIgnored at cmd/thv-operator/api/v1beta1/mcpserver_types.go:99 correctly describes its current behavior in the doc comment, but doesn't carry the deprecation lineage. This condition fires only for the deprecated spec.authzConfig.inline.primaryUpstreamProvider field, which per this PR is slated for removal one release after the deprecation cycle. When that field is removed, this condition (and the constant) should be removed in the same change — but a future maintainer reading just the comment block has no signal to do that cleanup.

Suggested options:

  1. Add a single sentence to the doc comment: "This condition fires only for the deprecated spec.authzConfig.inline.primaryUpstreamProvider; it should be removed together with that field at end of the deprecation cycle."
  2. Or rename the constant to carry the deprecation lineage in its name (e.g., ConditionTypeAuthzDeprecatedInlinePrimaryUpstreamProvider), which survives even if the comment drifts again.

Filing as a regular PR comment because line 99 is outside the diff hunks of this PR.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

3 participants