Skip to content

Mirror MCPExternalAuthConfig Valid=False onto consumer CR conditions#5354

Open
tgrunnagle wants to merge 5 commits into
mainfrom
oss-obo-4_issue_5347
Open

Mirror MCPExternalAuthConfig Valid=False onto consumer CR conditions#5354
tgrunnagle wants to merge 5 commits into
mainfrom
oss-obo-4_issue_5347

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

Summary

Closes #5347.

Today, when an MCPExternalAuthConfig surfaces Valid=False (for example,
an obo-typed config that the default OBO handler rejects with
Reason=EnterpriseRequired in upstream-only builds), the failure is visible
only on the referenced MCPExternalAuthConfig resource. The consumer CR
(MCPServer, MCPRemoteProxy, VirtualMCPServer) reports only a generic
dispatch error, so users must inspect a different object to learn why their
workload will not deploy. This violates the Status Condition Parity rule in
.claude/rules/operator.md and was deferred from #5345 as cross-cutting work.

This PR closes that parity gap:

  • Adds a shared external_auth_mirror.go primitive
    (mirroredExternalAuthConfigInvalid, mirroredInvalidExternalAuthConfigError,
    mirroredReasonFromError) so all three consumer reconcilers probe the source
    the same way and emit the same typed error.
  • MCPServer mirrors the source's reason and message onto
    ExternalAuthConfigValidated=False, and clears any stale mirror once the
    source heals (it has no downstream True setter, so without the heal path
    the False would persist after the user fixed the spec).
  • MCPRemoteProxy mirrors the same condition; it has an existing downstream
    ExternalAuthConfigValidated=True writer that overwrites any stale False
    naturally, so no separate heal branch is needed.
  • VirtualMCPServer mirrors the source reason onto each per-backend
    DiscoveredAuthConfig-<name> / BackendAuthConfig-<name> condition via a
    new AuthConfigError.Reason field. setAuthConfigConditions falls back to
    ConversionFailed when no source reason is present, preserving existing
    behavior for non-mirror failures. vMCP heals naturally because
    setAuthConfigConditions rebuilds per-backend conditions on every reconcile.

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Added or extended unit coverage for every site touched:

  • mcpserver_externalauth_test.go — table-driven mirror behavior on the
    MCPServer reconciler covering Valid=False/EnterpriseRequired,
    Valid=False/InvalidConfig, Valid=True (no mirror), no Valid condition
    (no mirror), and the heal-clearing regression guard for a stale mirror left
    by a prior reconcile.
  • mcpremoteproxy_controller_test.go — adds the Valid=False mirror case to
    the existing TestHandleExternalAuthConfig table on the MCPRemoteProxy
    reconciler.
  • virtualmcpserver_externalauth_test.go — exercises the discovered and
    inline backend paths, verifying the per-backend condition surfaces the
    mirrored reason and that the AuthConfigError.Reason field round-trips
    through convertBackendAuthConfigToVMCP -> buildOutgoingAuthConfig ->
    setAuthConfigConditions.
  • virtualmcpserver_vmcpconfig_test.go — direct coverage of
    setAuthConfigConditions reason-selection: mirrored reason when present,
    ConversionFailed fallback otherwise, across the default, discovered, and
    inline backend code paths.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed
    label is applied and the migration guidance is described above.

This PR adds an unexported Reason field to the in-process AuthConfigError
struct and reads existing Status.Conditions on consumer CRs. No CRD schema
changes.

Changes

File Change
cmd/thv-operator/controllers/external_auth_mirror.go New shared probe, typed error, and reason-extraction helper used by all three consumer reconcilers.
cmd/thv-operator/controllers/mcpserver_controller.go Mirror Valid=False onto ExternalAuthConfigValidated; clear stale mirror when the source heals.
cmd/thv-operator/controllers/mcpremoteproxy_controller.go Mirror Valid=False onto ExternalAuthConfigValidated; downstream True setter handles healing.
cmd/thv-operator/controllers/virtualmcpserver_controller.go Add AuthConfigError.Reason; mirror in discover/convert paths; propagate via buildOutgoingAuthConfig; surface in setAuthConfigConditions.
cmd/thv-operator/controllers/mcpserver_externalauth_test.go Table-driven mirror tests including heal-clearing regression guard.
cmd/thv-operator/controllers/mcpremoteproxy_controller_test.go Added Valid=False mirror case.
cmd/thv-operator/controllers/virtualmcpserver_externalauth_test.go New mirror cases across discovered + inline backend paths.
cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig_test.go Direct reason-selection coverage for setAuthConfigConditions.

Does this introduce a user-facing change?

Yes. When a consumer CR references an MCPExternalAuthConfig whose Valid
condition is False, the consumer's own Status.Conditions now carries a
parallel False condition with the same reason (for example
EnterpriseRequired) and message. Users no longer need to inspect a
different object to diagnose the failure on the workload they applied.

Special notes for reviewers

Stacked PR. This PR is the next in a stacked series. The upstream PRs in
the stack are #5332, #5338, and #5345 — PR #5345 is the immediate predecessor
and shares branch history. Until #5345 merges to main, the diff against
main will include those earlier commits. The in-scope commits for this PR
are the three HEAD-most:

  1. Mirror MCPExternalAuthConfig Valid=False onto consumer CR conditions
  2. Address code review feedback on consumer-CR mirror
  3. Tighten mirror probe API; trim duplicated error prefix

Please review after #5345 merges so the diff narrows to just this PR's scope.

Deferred AC4 (envtest integration tests). Issue #5347's AC4 calls for
integration tests under cmd/thv-operator/test-integration/mcp-external-auth/
that verify propagation end-to-end against envtest with a CRD-enum bypass.
Per in-session direction we deferred this to #5329 to avoid adding a
setup-envtest dependency here. Once #5329 admits obo at the CRD layer,
the integration test no longer needs a CRD-enum bypass and is substantially
simpler to write. Unit-level coverage in this PR exercises every consumer
reconciler's mirror behavior, the heal path on MCPServer, the shared
probe, the reason-propagation helpers, and the per-backend condition wiring
in setAuthConfigConditions.

Known follow-up: MCPServerEntry. MCPServerEntry (the per-backend
entry inside VirtualMCPServer) has the same parity gap, but it is
out-of-scope for #5347's explicit AC list
(MCPServer/MCPRemoteProxy/VirtualMCPServer). Suggest filing a separate
issue to track the entry-level mirror; not filing from this PR.

Merge ordering. This PR must merge before #5329 so production users see
the EnterpriseRequired failure on the resource they applied (the consumer
CR) the moment apiserver-level admission of obo-typed configs is lit up.

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label May 20, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 89.41176% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.65%. Comparing base (483e91d) to head (c9d0a70).

Files with missing lines Patch % Lines
...perator/controllers/virtualmcpserver_controller.go 77.77% 7 Missing and 1 partial ⚠️
...d/thv-operator/controllers/mcpserver_controller.go 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5354      +/-   ##
==========================================
- Coverage   68.68%   68.65%   -0.03%     
==========================================
  Files         625      626       +1     
  Lines       63298    63365      +67     
==========================================
+ Hits        43474    43505      +31     
- Misses      16583    16619      +36     
  Partials     3241     3241              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tgrunnagle tgrunnagle changed the base branch from main to oss-obo-3_issue_5328 May 20, 2026 21:21
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 20, 2026
@github-actions github-actions Bot dismissed their stale review May 20, 2026 21:21

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Copy Markdown
Contributor

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review summary

Four specialist agents (k8s-conditions, go-style, tests, general-quality) reviewed the diff. Codex cross-review skipped — codex CLI not installed on the reviewer host.

Consensus summary

# Severity File Title
F1 HIGH mcpserver_controller.go MCPServer leaks mirrored False when ExternalAuthConfigRef is removed from spec
F2 MEDIUM mcpremoteproxy_controller.go MCPRemoteProxy mirror lacks defensive clear-on-heal
F3 MEDIUM virtualmcpserver_externalauth_test.go Wrapped-error round-trip through buildOutgoingAuthConfig not exercised
F4 MEDIUM mcpserver_controller.go Asymmetric receiver vs. free-function design between parallel mirror helpers
F5 LOW mcpserver_controller.go Doubled MCPExternalAuthConfig prefix in error message
F6 LOW external_auth_mirror.go Empty source Reason copied verbatim onto consumer condition
F7 LOW external_auth_mirror.go Probe doc comment is verbose; rationale duplicated across file
F8 LOW mcpremoteproxy_controller_test.go MCPRemoteProxy mirror test does not validate message propagation
F9 LOW mcpserver_externalauth_test.go ObservedGeneration is set but not asserted in tests
F10 LOW mcpserver_externalauth_test.go Remove ref and source NotFound heal edges not tested (related to F1)

Holistic assessment

What the PR does well

  • Closes a real Status Condition Parity gap targeted by .claude/rules/operator.md.
  • Shared probe + typed error in external_auth_mirror.go keeps the per-reconciler call sites small and uniform.
  • Good unit coverage on the three consumer reconcilers (AC1–AC3), including a regression guard for MCPServer heal-via-source-flip.
  • errors.As round-trip through mirroredReasonFromError is the right idiom for threading the source reason across the vMCP AuthConfigError aggregator.
  • ObservedGeneration is set correctly on every new condition write.

Risks / trade-offs to weigh

  • MCPServer.handleExternalAuthConfig returns early when the ref is removed without clearing the mirrored False, so the condition outlives its cause in that healing scenario. This is the one issue that should be addressed before merge — it directly contradicts the PR's stated heal-path intent.
  • MCPRemoteProxy heal correctness depends on a downstream True-writer being reachable on every code path, vs. MCPServer's defensive clear-on-heal. Pick one model and apply it to both.
  • Two near-identical helpers have asymmetric shapes — method on *MCPServerReconciler with unused receiver vs. free function for MCPRemoteProxy.
  • The mirroredReasonFromError test does not validate the errors.As walk through the fmt.Errorf("... %w", err) wrap that production code actually applies in buildOutgoingAuthConfig — a future %w%v regression would not be caught.

Recommendation

Fix F1 (one-line meta.RemoveStatusCondition in the no-ref branch of MCPServerReconciler.handleExternalAuthConfig plus a regression test). The MEDIUM findings (F2–F4) are worth addressing in the same iteration; the LOW findings are quality follow-ups the author can accept or defer.

AC traceability vs. #5347

  • AC1 (MCPServer) — addressed, but heal-path-on-ref-removal is incomplete (F1).
  • AC2 (MCPRemoteProxy) — addressed; heal contract is fragile (F2).
  • AC3 (VirtualMCPServer) — addressed; reason-propagation correctness depends on wrapped-error contract that lacks a test (F3).
  • AC4 (envtest integration) — explicitly deferred to #5329 per PR body; out of scope.
  • AC5 (merge ordering) — process item; out of code-review scope.

Generated with Claude Code.

Comment thread cmd/thv-operator/controllers/mcpserver_controller.go Outdated
Comment thread cmd/thv-operator/controllers/mcpremoteproxy_controller.go Outdated
Comment thread cmd/thv-operator/controllers/virtualmcpserver_externalauth_test.go
Comment thread cmd/thv-operator/controllers/mcpserver_controller.go Outdated
Comment thread cmd/thv-operator/controllers/mcpserver_controller.go Outdated
Comment thread cmd/thv-operator/controllers/external_auth_mirror.go Outdated
Comment thread cmd/thv-operator/controllers/external_auth_mirror.go
Comment thread cmd/thv-operator/controllers/mcpremoteproxy_controller_test.go Outdated
Comment thread cmd/thv-operator/controllers/mcpserver_externalauth_test.go
Comment thread cmd/thv-operator/controllers/mcpserver_externalauth_test.go
tgrunnagle added a commit that referenced this pull request May 21, 2026
Addresses #5354 review comments:
- HIGH cmd/thv-operator/controllers/mcpserver_controller.go (F1):
  the previous mirror added in this branch handled the
  source-Valid-flips-True heal path, but the no-ref early-return
  in handleExternalAuthConfig never reached the helper, so removing
  the spec.externalAuthConfigRef left a stale Valid=False/EnterpriseRequired
  on the consumer indefinitely. Clear the condition in the no-ref
  branch and in both NotFound branches so the mirror does not
  outlive its cause.
- MEDIUM cmd/thv-operator/controllers/mcpremoteproxy_controller.go (F2):
  mirror helper now defensively clears its own condition on the
  healed path, matching MCPServer. The downstream True-writer at
  the end of handleExternalAuthConfig still sets the
  ConditionReasonMCPRemoteProxyExternalAuthConfigValid reason on
  success; the defensive clear closes the gap if a future control-flow
  change adds an early return between this helper and that writer.
- LOW cmd/thv-operator/controllers/mcpserver_externalauth_test.go (F10):
  two new regression-guard tests for the heal paths uncovered by F1
  — spec.externalAuthConfigRef removed, referenced source NotFound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tgrunnagle added a commit that referenced this pull request May 21, 2026
Addresses #5354 review comments:
- MEDIUM cmd/thv-operator/controllers/virtualmcpserver_externalauth_test.go (F3):
  add a wrapped-error round-trip assertion so a future %w -> %v
  regression in buildOutgoingAuthConfig can't silently break
  per-backend reason propagation.
- MEDIUM cmd/thv-operator/controllers/mcpserver_controller.go (F4):
  move both consumer mirror functions to free functions in
  external_auth_mirror.go (mirrorInvalidOnMCPServer /
  mirrorInvalidOnRemoteProxy) instead of one method with an unused
  receiver and one suffixed free function. Single canonical home.
- LOW cmd/thv-operator/controllers/mcpserver_controller.go (F5):
  drop the "referenced MCPExternalAuthConfig is invalid" prefix from
  the typed error's Error(); the outer wrap on each consumer carries
  the namespaced name. Rendered messages now read
  "MCPExternalAuthConfig ns/name: invalid (Reason): Message".
- LOW cmd/thv-operator/controllers/external_auth_mirror.go (F6):
  substitute a fallback reason when the source surfaces Valid=False
  with an empty Reason so the apiserver patch cannot fail validation
  on a corrupt/partial source status. Add a TestMirroredExternalAuthConfigInvalid
  subtest for the empty-Reason input.
- LOW cmd/thv-operator/controllers/external_auth_mirror.go (F7):
  trim the probe doc; package-level comment carries the #5347
  motivation once, the per-function docs are now contract-only.
- LOW cmd/thv-operator/controllers/mcpremoteproxy_controller_test.go (F8):
  add expectedCondMessage to the test table and assert it on the
  OBO mirror row.
- LOW cmd/thv-operator/controllers/mcpserver_externalauth_test.go (F9):
  set non-zero Generation on the test fixtures and assert
  ObservedGeneration on the mirrored condition. Same gap covered on
  the MCPRemoteProxy row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 21, 2026
Base automatically changed from oss-obo-3_issue_5328 to main May 21, 2026 16:42
tgrunnagle and others added 5 commits May 21, 2026 09:46
Closes the Status Condition Parity gap noted in #5347. Before this
change, an obo-typed MCPExternalAuthConfig surfaced
Valid=False/EnterpriseRequired only on the referenced resource. The
consumer CR (MCPServer, MCPRemoteProxy, VirtualMCPServer) showed only
the generic dispatch failure, leaving users to inspect the referenced
config to understand why their workload would not deploy.

Each consumer reconciler now inspects the referenced config's Valid
condition. When it is False, the reconciler mirrors the source's reason
and message onto its own ExternalAuthConfigValidated condition (or
per-backend DiscoveredAuthConfig/BackendAuthConfig condition for vMCP).

This must merge before #5329 lights up apiserver-level admission of
obo-typed configs so production users see the failure on the resource
they applied. The envtest integration test for this propagation is
deferred to #5329 to avoid adding a setup-envtest dependency here; once
#5329 admits "obo" at the CRD layer, the test no longer needs a
CRD-enum bypass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes from the code review on the previous commit:

1. (correctness) MCPServer.handleExternalAuthConfig now clears any
   stale ExternalAuthConfigValidated=False the mirror previously wrote
   when the referenced source has healed (Valid=True or absent).
   Without this, the False condition outlived its cause: the user fixed
   their spec but the mirror stayed sticky because MCPServer has no
   downstream True setter on this condition. Adds a regression-guard
   test case.

2. (parity) The mirror probe, typed error, and reason-extraction
   helper now live in a shared external_auth_mirror.go file instead
   of being duplicated across the three consumer reconcilers. All
   three reconcilers (MCPServer, MCPRemoteProxy, VirtualMCPServer)
   call the same primitive and surface the same typed error so
   downstream consumers like buildOutgoingAuthConfig can react
   without string-matching.

MCPRemoteProxy already sets ExternalAuthConfigValidated=True
downstream and so does not need the heal-clearing branch — its
existing True writer overwrites any stale mirror. VirtualMCPServer
heals naturally because setAuthConfigConditions removes stale
per-backend conditions on each reconcile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes from the second round of code review feedback on the
consumer-CR mirror:

1. mirroredExternalAuthConfigInvalid now returns the typed
   *mirroredInvalidExternalAuthConfigError directly (nil when the
   source is healed) instead of a (reason, error) tuple. Callers no
   longer need errors.As to recover the message field, and the
   pointer-or-nil shape removes the awkward "discard the bool"
   pattern at the MCPServer and MCPRemoteProxy call sites. The typed
   value still satisfies the error interface so vMCP's
   convertBackendAuthConfigToVMCP can keep passing it through opaque
   error-returning APIs and recover the reason later via
   mirroredReasonFromError.

2. MCPServer and MCPRemoteProxy's outer error wrap no longer repeats
   the typed error's "referenced MCPExternalAuthConfig is invalid"
   prefix, so messages render as a single sentence instead of a
   doubled phrase. The namespaced name is preserved through %w.

Updates the mirror probe's unit test to the new return shape and
adds a round-trip assertion that the typed pointer flows through
mirroredReasonFromError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses #5354 review comments:
- HIGH cmd/thv-operator/controllers/mcpserver_controller.go (F1):
  the previous mirror added in this branch handled the
  source-Valid-flips-True heal path, but the no-ref early-return
  in handleExternalAuthConfig never reached the helper, so removing
  the spec.externalAuthConfigRef left a stale Valid=False/EnterpriseRequired
  on the consumer indefinitely. Clear the condition in the no-ref
  branch and in both NotFound branches so the mirror does not
  outlive its cause.
- MEDIUM cmd/thv-operator/controllers/mcpremoteproxy_controller.go (F2):
  mirror helper now defensively clears its own condition on the
  healed path, matching MCPServer. The downstream True-writer at
  the end of handleExternalAuthConfig still sets the
  ConditionReasonMCPRemoteProxyExternalAuthConfigValid reason on
  success; the defensive clear closes the gap if a future control-flow
  change adds an early return between this helper and that writer.
- LOW cmd/thv-operator/controllers/mcpserver_externalauth_test.go (F10):
  two new regression-guard tests for the heal paths uncovered by F1
  — spec.externalAuthConfigRef removed, referenced source NotFound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses #5354 review comments:
- MEDIUM cmd/thv-operator/controllers/virtualmcpserver_externalauth_test.go (F3):
  add a wrapped-error round-trip assertion so a future %w -> %v
  regression in buildOutgoingAuthConfig can't silently break
  per-backend reason propagation.
- MEDIUM cmd/thv-operator/controllers/mcpserver_controller.go (F4):
  move both consumer mirror functions to free functions in
  external_auth_mirror.go (mirrorInvalidOnMCPServer /
  mirrorInvalidOnRemoteProxy) instead of one method with an unused
  receiver and one suffixed free function. Single canonical home.
- LOW cmd/thv-operator/controllers/mcpserver_controller.go (F5):
  drop the "referenced MCPExternalAuthConfig is invalid" prefix from
  the typed error's Error(); the outer wrap on each consumer carries
  the namespaced name. Rendered messages now read
  "MCPExternalAuthConfig ns/name: invalid (Reason): Message".
- LOW cmd/thv-operator/controllers/external_auth_mirror.go (F6):
  substitute a fallback reason when the source surfaces Valid=False
  with an empty Reason so the apiserver patch cannot fail validation
  on a corrupt/partial source status. Add a TestMirroredExternalAuthConfigInvalid
  subtest for the empty-Reason input.
- LOW cmd/thv-operator/controllers/external_auth_mirror.go (F7):
  trim the probe doc; package-level comment carries the #5347
  motivation once, the per-function docs are now contract-only.
- LOW cmd/thv-operator/controllers/mcpremoteproxy_controller_test.go (F8):
  add expectedCondMessage to the test table and assert it on the
  OBO mirror row.
- LOW cmd/thv-operator/controllers/mcpserver_externalauth_test.go (F9):
  set non-zero Generation on the test fixtures and assert
  ObservedGeneration on the mirrored condition. Same gap covered on
  the MCPRemoteProxy row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tgrunnagle tgrunnagle force-pushed the oss-obo-4_issue_5347 branch from 01aae22 to c9d0a70 Compare May 21, 2026 16:50
@tgrunnagle tgrunnagle marked this pull request as ready for review May 21, 2026 16:51
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mirror MCPExternalAuthConfig.Valid=False/EnterpriseRequired to consumer CRs

3 participants