make imageproxyserver handle other registry names#276
Conversation
6dd7302 to
789a6c3
Compare
b24adab to
18ae968
Compare
d8f6bd1 to
d83197e
Compare
f5650e7 to
b446367
Compare
WalkthroughAdds a registry Validator, multi-registry-aware image proxy with auth detection and caching, controller helpers for parsing/resolving OCI images, an in-memory test registry, and wires the Validator into PXE/HTTP reconcilers and the image proxy. Tests and Dockerfile updated to include the new registry package. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Controller as PXE/HTTP<br/>Reconciler
participant Validator as Registry<br/>Validator
participant Registry as OCI<br/>Registry
Client->>Controller: Trigger reconcile with ServerBootConfig (imageRef)
Controller->>Validator: ValidateImageRegistry(imageRef)
Validator->>Validator: ExtractRegistryDomain & check ALLOWED/BLOCKED
alt Registry allowed
Validator-->>Controller: nil
Controller->>Registry: Resolve manifests (FindManifestByArchitecture)
Registry-->>Controller: Manifest / blobs
Controller-->>Client: Update status / boot details
else Registry denied
Validator-->>Controller: error
Controller-->>Client: Reconcile fails (validation error)
end
sequenceDiagram
participant HTTPClient as Client
participant ProxyServer as ImageProxy
participant Cache as RegistryCache
participant TargetRegistry as OCI<br/>Registry
HTTPClient->>ProxyServer: Request image resource (imageRef)
ProxyServer->>ProxyServer: Extract RegistryDomain & Repository
ProxyServer->>ProxyServer: Validator.IsRegistryAllowed(registry)
alt Registry denied
ProxyServer-->>HTTPClient: 403 Forbidden
else Registry allowed
ProxyServer->>Cache: getOrDetectRegistry(registry, repo)
alt cached
Cache-->>ProxyServer: RegistryInfo
else not cached
ProxyServer->>TargetRegistry: Probe /v2/ (HEAD/GET)
TargetRegistry-->>ProxyServer: 401 + WWW-Authenticate
ProxyServer->>ProxyServer: parse auth (Bearer/None)
alt Bearer
ProxyServer->>TargetRegistry: Request token
TargetRegistry-->>ProxyServer: Bearer token
end
Cache->>Cache: store RegistryInfo
Cache-->>ProxyServer: RegistryInfo
end
ProxyServer->>TargetRegistry: Proxy request (with token if any)
TargetRegistry-->>ProxyServer: manifest/blob
ProxyServer->>ProxyServer: modify response (rewrite headers, follow redirects)
ProxyServer-->>HTTPClient: proxied image data
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
internal/controller/suite_test.go (1)
109-113: DuplicateAddToSchemecall — likely a merge artifact.
bootv1alpha1.AddToScheme(scheme.Scheme)is called on line 109 and again on lines 112–113. The second call is redundant (AddToScheme is idempotent) but should be removed to avoid confusion.Suggested fix
Expect(bootv1alpha1.AddToScheme(scheme.Scheme)).NotTo(HaveOccurred()) Expect(metalv1alpha1.AddToScheme(scheme.Scheme)).NotTo(HaveOccurred()) - - err = bootv1alpha1.AddToScheme(scheme.Scheme) - Expect(err).NotTo(HaveOccurred())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/suite_test.go` around lines 109 - 113, Remove the redundant bootv1alpha1.AddToScheme call and its associated err/Expect block: the earlier Expect(bootv1alpha1.AddToScheme(scheme.Scheme)).NotTo(HaveOccurred()) already registers the scheme, so delete the later err = bootv1alpha1.AddToScheme(scheme.Scheme) and Expect(err).NotTo(HaveOccurred()) lines to avoid the duplicate call and unused err variable while keeping the initial AddToScheme checks (and the metalv1alpha1.AddToScheme) intact.test/registry/registry.go (2)
134-147:architectureparameter is accepted but unused — no multi-arch index is created.Both
PushPXEImageandPushPXEImageOldFormataccept anarchitecturestring butpushPXEManifeststores only a flat manifest, not anocispec.Indexwith platform entries. This means the image index / multi-arch code path ingetUKIDigestFromNestedManifest(checkingMediaTypeImageIndexandplatform.Architecture) is never exercised by tests.If multi-arch testing is planned, consider creating an
ocispec.Indexwrapping per-arch manifests. If not needed now, at minimum drop the unused parameter to avoid confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/registry/registry.go` around lines 134 - 147, PushPXEImage and PushPXEImageOldFormat accept an unused architecture parameter which prevents exercising the multi-arch code path in getUKIDigestFromNestedManifest; either remove the architecture parameter from these methods and all callers, or modify MockRegistry.pushPXEManifest (called by PushPXEImage/PushPXEImageOldFormat) to create an ocispec.Index that wraps per-architecture manifests instead of a flat manifest: for each architecture build a manifest for MediaTypeKernel/MediaTypeInitrd (or old types), add them to an ocispec.Index with platform entries setting platform.Architecture to the provided architecture string, and store the index so getUKIDigestFromNestedManifest can detect MediaTypeImageIndex and platform.Architecture.
215-228: Digest is computed for a zero-value manifest on tag-miss before the existence check.When the tag lookup on line 218 misses,
manifestis a zero-valueocispec.Manifest. Lines 220–222 marshal and hash it before the!existscheck on line 225. Moving the existence check earlier avoids unnecessary work and makes the logic clearer.Suggested fix
} else { // Look up by tag imageRef := fmt.Sprintf("%s:%s", name, reference) manifest, exists = r.manifests[imageRef] + if !exists { + http.Error(w, "manifest not found", http.StatusNotFound) + return + } // Calculate digest for Content-Digest header manifestBytes, _ := json.Marshal(manifest) dgst := digest.FromBytes(manifestBytes) manifestDigest = dgst.String() } - if !exists { - http.Error(w, "manifest not found", http.StatusNotFound) - return - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/registry/registry.go` around lines 215 - 228, Move the existence check above the manifest marshaling and digest computation to avoid hashing a zero-value manifest when the tag lookup fails: after resolving manifest and exists from r.manifests (the imageRef branch in the registry handler), immediately return the 404 if !exists, and only then perform json.Marshal(manifest), digest.FromBytes(manifestBytes) and set manifestDigest; update references to manifestBytes/manifestDigest accordingly so they are only computed when manifest actually exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/serverbootconfiguration_http_controller.go`:
- Around line 221-227: The current nil-check in getUKIDigestFromNestedManifest
silently skips registry validation; change this to a fail-closed behavior by
returning an error if r.RegistryValidator is nil instead of proceeding, and
otherwise call r.RegistryValidator.ValidateImageRegistry(imageRef) and propagate
any error (use fmt.Errorf to wrap it); reference the
ServerBootConfigurationHTTPReconciler type, its RegistryValidator field, and the
ValidateImageRegistry method for locating the code; if you intentionally want
backward compatibility instead, add a clear comment above the nil check
explaining the rationale and intended rollout plan.
- Around line 202-218: The image ref parsing in constructUKIURL mis-splits
references with registry ports; replace the fragile LastIndex(":") logic with a
proper OCI reference parser (e.g., github.com/distribution/reference's
ParseNormalizedNamed / Reference.Parse) to reliably extract name and tag/digest,
or at minimum validate that the substring after the last ':' does not contain
'/' (treating it as a port) before using it as the tag; update constructUKIURL
to produce correct imageName and imageVersion values used when calling
getUKIDigestFromNestedManifest and when formatting ukiURL with r.ImageServerURL.
In `@internal/controller/serverbootconfiguration_pxe_controller.go`:
- Around line 226-233: The current tag parsing uses only strings.LastIndex on
config.Spec.Image which misparses registry-port forms like
"localhost:5000/repo"; update the logic to require the tag-colon to appear after
the last '/' (i.e. find lastSlash := strings.LastIndex(config.Spec.Image, "/")
and then find a colon after that position), and only split into imageName and
imageVersion when that colon exists and is after lastSlash; otherwise return the
"invalid image format: missing tag" error. Reference the symbols
config.Spec.Image, lastColon, imageName, imageVersion (and add lastSlash) when
implementing the fix.
In `@internal/controller/suite_test.go`:
- Around line 164-169: The HTTP reconciler instantiation is missing the
Architecture field, so set Architecture: runtime.GOARCH on the
ServerBootConfigurationHTTPReconciler (the same way it's set for the PXE
reconciler) before calling SetupWithManager; update the object literal that
constructs ServerBootConfigurationHTTPReconciler (fields: Client, Scheme,
ImageServerURL, RegistryValidator) to include Architecture: runtime.GOARCH so
r.Architecture is populated and multi-arch image index matching works.
In `@internal/registry/validation_test.go`:
- Around line 151-158: The subtests are currently only calling
t.Setenv("ALLOWED_REGISTRIES", tt.allowList) and t.Setenv("BLOCKED_REGISTRIES",
tt.blockList) when those values are non-empty, which lets ambient env values
leak into “empty config” cases; always call t.Setenv for both keys inside each
t.Run regardless of tt.allowList/tt.blockList so empty strings are explicitly
set, and apply the same unconditional t.Setenv changes to the other loop (the
one around lines 250–257) to prevent flakiness.
In `@internal/registry/validation.go`:
- Around line 27-41: The registry extraction functions (notably
ExtractRegistryDomain) currently return different strings for Docker Hub
equivalents (e.g., "registry-1.docker.io" vs "docker.io"), which breaks
exact-match policy checks; update ExtractRegistryDomain and the other
registry-extraction helpers in this file (the blocks around lines 44-55 and
59-66) to normalize aliases to a single canonical value (choose one, e.g.,
"docker.io") and apply normalization after computing potentialRegistry (also
handle hostless references by returning the canonical docker.io), ensure you
lowercase the result and map any known aliases like "registry-1.docker.io" and
"index.docker.io" to the chosen canonical name before returning so policy
evaluation uses a single consistent registry identifier.
In `@server/imageproxyserver.go`:
- Around line 277-281: Replace the current placeholder in the AuthBasic branch
(the "case AuthBasic" block in server/imageproxyserver.go) so the proxy reads
credentials from the configured Kubernetes Secret for the registry, builds an
Authorization header with "Basic " + base64(username:password), injects that
header into the outbound registry request, and proceeds instead of returning
StatusNotImplemented; log the registry domain and only return an error (401/502)
if the secret is missing or malformed. Ensure you reference the same secret
name/config used elsewhere in the service, decode the secret values safely,
create the header before the HTTP request is made, and preserve existing logging
semantics (use log.Info/log.Error) when credentials are missing or invalid.
- Around line 89-93: The registry auth cache is keyed only by registry so
different repository-scoped token URLs (built from realm, service, scope
variables) can collide; update the caching logic to key tokens by the full token
URL or include the repository/service/scope in the cache key instead of just the
registry. Locate the code that constructs the token request URL (uses realm,
service, scope) and the registry auth cache lookup/insert and change the key to
either the formatted URL returned by the token URL builder or a composite key of
registry+service+scope to ensure tokens are cached per scope.
- Around line 315-319: The buildModifyResponse function is leaking upstream
registry bearer tokens by copying bearerToken into response headers; remove the
resp.Header.Set("Authorization", "Bearer "+bearerToken) behavior and ensure any
Authorization header from the upstream response is removed
(resp.Header.Del("Authorization")) so client responses never contain registry
credentials; update buildModifyResponse to stop injecting the bearerToken and
strip Authorization before returning.
- Around line 97-103: The outbound HTTP calls in detectRegistryAuth (the initial
http.Get to targetURL), the subsequent http.Get(tokenURL), and the ad-hoc
&http.Client{} used before client.Do() must use explicit timeouts; replace plain
http.Get usages and the bare http.Client with a shared configured *http.Client
that has a sensible Timeout (or use a request-specific context with deadline)
and use that client for GET and Do calls (reference detectRegistryAuth, tokenURL
usage, and the client variable) so all external HTTP requests cannot hang
indefinitely.
---
Nitpick comments:
In `@internal/controller/suite_test.go`:
- Around line 109-113: Remove the redundant bootv1alpha1.AddToScheme call and
its associated err/Expect block: the earlier
Expect(bootv1alpha1.AddToScheme(scheme.Scheme)).NotTo(HaveOccurred()) already
registers the scheme, so delete the later err =
bootv1alpha1.AddToScheme(scheme.Scheme) and Expect(err).NotTo(HaveOccurred())
lines to avoid the duplicate call and unused err variable while keeping the
initial AddToScheme checks (and the metalv1alpha1.AddToScheme) intact.
In `@test/registry/registry.go`:
- Around line 134-147: PushPXEImage and PushPXEImageOldFormat accept an unused
architecture parameter which prevents exercising the multi-arch code path in
getUKIDigestFromNestedManifest; either remove the architecture parameter from
these methods and all callers, or modify MockRegistry.pushPXEManifest (called by
PushPXEImage/PushPXEImageOldFormat) to create an ocispec.Index that wraps
per-architecture manifests instead of a flat manifest: for each architecture
build a manifest for MediaTypeKernel/MediaTypeInitrd (or old types), add them to
an ocispec.Index with platform entries setting platform.Architecture to the
provided architecture string, and store the index so
getUKIDigestFromNestedManifest can detect MediaTypeImageIndex and
platform.Architecture.
- Around line 215-228: Move the existence check above the manifest marshaling
and digest computation to avoid hashing a zero-value manifest when the tag
lookup fails: after resolving manifest and exists from r.manifests (the imageRef
branch in the registry handler), immediately return the 404 if !exists, and only
then perform json.Marshal(manifest), digest.FromBytes(manifestBytes) and set
manifestDigest; update references to manifestBytes/manifestDigest accordingly so
they are only computed when manifest actually exists.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
Dockerfilecmd/main.gointernal/controller/serverbootconfiguration_http_controller.gointernal/controller/serverbootconfiguration_http_controller_test.gointernal/controller/serverbootconfiguration_pxe_controller.gointernal/controller/serverbootconfiguration_pxe_controller_test.gointernal/controller/suite_test.gointernal/registry/validation.gointernal/registry/validation_test.goserver/imageproxyserver.gotest/registry/registry.go
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/imageproxyserver.go (2)
361-369:⚠️ Potential issue | 🟡 Minor
replaceResponseoverwrites headers withSetinstead ofAdd, losing multi-valued redirect response headers.Line 364 uses
originalResp.Header.Set(name, value)inside the inner loop overvalues. SinceSetreplaces all previous values for that key, only the last value of each multi-valued header survives. This is likely a bug carried over from before, but it's worth noting.🐛 Proposed fix
func replaceResponse(originalResp, redirectResp *http.Response) { + // Clear original headers before copying redirect headers + for name := range originalResp.Header { + originalResp.Header.Del(name) + } for name, values := range redirectResp.Header { for _, value := range values { - originalResp.Header.Set(name, value) + originalResp.Header.Add(name, value) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 361 - 369, In replaceResponse, current header copying uses originalResp.Header.Set(name, value) inside the loop which overwrites previous values and drops multi-valued headers; change the header copy to use originalResp.Header.Add(name, value) when iterating redirectResp.Header values so all header entries are preserved, while still assigning originalResp.Body = redirectResp.Body and originalResp.StatusCode = redirectResp.StatusCode as before.
321-326:⚠️ Potential issue | 🟠 MajorBearer token leaked to redirect targets via
copyHeaders.When the registry returns a 307 redirect (typically to S3/GCS/CDN for blob storage),
copyHeaders(resp.Request.Header, redirectReq.Header)on line 325 copies all headers from the outbound proxy request—includingAuthorization: Bearer <token>set bybuildDirector. This leaks registry credentials to third-party storage backends.Standard OCI clients (e.g., Docker, containerd) strip the
Authorizationheader when following cross-domain redirects. This proxy should do the same.🔒 Proposed fix — strip Authorization on cross-host redirects
redirectReq, err := http.NewRequest("GET", location.String(), nil) if err != nil { return err } copyHeaders(resp.Request.Header, redirectReq.Header) + // Strip auth header when redirecting to a different host (e.g., CDN/S3) + if location.Host != resp.Request.URL.Host { + redirectReq.Header.Del("Authorization") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 321 - 326, The redirect handling currently copies all headers from resp.Request to redirectReq via copyHeaders, leaking Authorization set by buildDirector to third-party redirect targets; modify the logic around redirect handling (the block that creates redirectReq and calls copyHeaders) to detect cross-host redirects by comparing resp.Request.URL.Host (or resp.Request.Host) with redirectReq.URL.Host (or the parsed location.Host) and, when they differ, remove or avoid copying the "Authorization" header (and other sensitive auth headers if present) before making the redirected request; keep Authorization intact for same-host redirects.
♻️ Duplicate comments (1)
internal/controller/serverbootconfiguration_http_controller.go (1)
205-209: Same fix needed here as the PXE digest/validator path.Line 206-Line 208 has the same
name:versionreconstruction and nil-dereference pattern; apply the same@digesthandling and fail-closed nil guard here too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfiguration_http_controller.go` around lines 205 - 209, In getUKIDigestFromNestedManifest, avoid reconstructing imageRef as "name:version" that can drop an `@digest` and risk nil deref; instead detect and preserve an existing digest (image@sha256:...) when building imageRef, and add a nil-guard / fail-closed check after any registry manifest lookup so you return an error if the manifest or digest is nil. Update logic around RegistryValidator.ValidateImageRegistry and subsequent manifest handling to prefer the `@digest` form and propagate errors when digest/manifest is missing (mirror the fix applied in the PXE digest/validator path).
🧹 Nitpick comments (8)
test/registry/registry.go (1)
141-145:architectureparameter is unused in mock push helpers.Line 141 and Line 149 accept
architecturebut ignore it, which can hide architecture-path test gaps. Either use it for index/platform fixtures or explicitly mark it unused.♻️ Minimal clarity fix
-func (r *MockRegistry) PushPXEImage(name, tag, architecture string) error { +func (r *MockRegistry) PushPXEImage(name, tag, _ string) error {-func (r *MockRegistry) PushPXEImageOldFormat(name, tag, architecture string) error { +func (r *MockRegistry) PushPXEImageOldFormat(name, tag, _ string) error {Also applies to: 149-153
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/registry/registry.go` around lines 141 - 145, The MockRegistry push helpers (e.g., MockRegistry.PushPXEImage) accept an architecture parameter but never use it; either thread the architecture into the mock manifest/index creation or explicitly mark it unused. Fix by updating MockRegistry.PushPXEImage (and the other mock push helpers around the same area) to pass architecture into pushPXEManifest (or alter pushPXEManifest to accept an architecture and set the manifest/index Platform.Architecture accordingly) so test fixtures reflect the platform, or if you prefer no behavioral change, explicitly consume the param (e.g., _ = architecture) in each method to signal intentional omission.internal/registry/validation_test.go (1)
68-149: Add a case-variant registry test to prevent policy regressions.Please add a
DOCKER.IO/...case (block-list mode) so case-sensitivity behavior is explicitly locked by tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/registry/validation_test.go` around lines 68 - 149, Add a test in TestIsRegistryAllowed to verify case-variant matching for block-list behavior: insert a test case into the tests slice (e.g. name "blocked with block list - DOCKER.IO", registry "DOCKER.IO", blockList "docker.io,registry-1.docker.io", allowList "", want false) so the test asserts that registry matching is case-insensitive for Registry validation; modify only the tests slice in TestIsRegistryAllowed and keep the rest of the table-driven test intact.internal/controller/serverbootconfig_helpers.go (3)
63-77: Unnecessary double-unmarshal of manifest data.The data is first unmarshaled into
ocispec.Manifest(line 70), and if the descriptor is an index, it's immediately re-unmarshaled intoocispec.Index(line 81). Sinceocispec.Indexandocispec.Manifestare different types, the first unmarshal is wasted work for the index path.Consider deferring the unmarshal until after the media type check:
♻️ Suggested refactor
func FindManifestByArchitecture(ctx context.Context, resolver remotes.Resolver, name string, desc ocispec.Descriptor, architecture string, enableCNAMECompat bool) (ocispec.Manifest, error) { manifestData, err := fetchContent(ctx, resolver, name, desc) if err != nil { return ocispec.Manifest{}, fmt.Errorf("failed to fetch manifest data: %w", err) } - var manifest ocispec.Manifest - if err := json.Unmarshal(manifestData, &manifest); err != nil { - return ocispec.Manifest{}, fmt.Errorf("failed to unmarshal manifest: %w", err) - } - // If not an index, return the manifest directly if desc.MediaType != ocispec.MediaTypeImageIndex { + var manifest ocispec.Manifest + if err := json.Unmarshal(manifestData, &manifest); err != nil { + return ocispec.Manifest{}, fmt.Errorf("failed to unmarshal manifest: %w", err) + } return manifest, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfig_helpers.go` around lines 63 - 77, FindManifestByArchitecture currently unmarshals manifestData into ocispec.Manifest before checking desc.MediaType, causing a wasted unmarshal when the descriptor is an index; change the logic to first check desc.MediaType against ocispec.MediaTypeImageIndex (using the existing manifestData fetched by fetchContent), and only unmarshal into ocispec.Index when it is an index and into ocispec.Manifest when it is not; update subsequent uses of the unmarshaled value to refer to the correctly typed variable (ocispec.Index or ocispec.Manifest) instead of doing a double-unmarshal.
194-198: Use structured logging instead offmt.Printffor the close error.This is the only place in the file that uses
fmt.Printffor error reporting. It bypasses the structured logger used elsewhere in the codebase. Consider accepting a logger or usingslog, or at minimum switching tofmt.Fprintf(os.Stderr, ...)so it doesn't get lost in stdout.♻️ Suggested refactor — propagate via context logger or just ignore
Since close errors on read-only streams are almost always benign, the simplest fix is to just drop the printf:
defer func() { - if cerr := reader.Close(); cerr != nil { - fmt.Printf("failed to close reader: %v\n", cerr) - } + _ = reader.Close() }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfig_helpers.go` around lines 194 - 198, The defer that closes reader uses fmt.Printf which breaks structured logging; replace that printf with the project's structured logger (e.g., use slog or the function's context/logger parameter) or at minimum write to stderr—capture cerr from reader.Close() and log it via the structured logger (e.g., slog.Error("failed to close reader", "err", cerr)) inside the existing defer anonymous function that checks cerr; alternatively, if close errors are benign here, simply drop the log call.
155-159: Passingnilas the error argument tolog.Error.
log.Error(nil, "Failed to decode object into Secret", ...)violates thelogrconvention where the first argument should be a non-nil error or the call should uselog.Infoinstead. Usingnilmakes structured log consumers (e.g., JSON loggers) emit a null error field.♻️ Suggested fix
- log.Error(nil, "Failed to decode object into Secret", "object", secret) + log.Info("Failed to decode object into Secret", "object", secret)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfig_helpers.go` around lines 155 - 159, The call log.Error(nil, "Failed to decode object into Secret", "object", secret) is passing nil as the error; change it so the logging follows logr conventions: either construct a real error (e.g., fmt.Errorf with the unexpected type) and pass it as the first argument to log.Error, or switch to log.Info with the same message and context. Update the call around secret/secretObj handling (the block that does secretObj, ok := secret.(*corev1.Secret)) to use one of these two approaches and include the offending value (secret) in the structured context.server/imageproxyserver.go (3)
96-101: Newhttp.Clientallocated per call todetectRegistryAuthandgetBearerToken.Lines 99 and 163 each create a fresh
http.Client. Sincehttp.Clientmanages an internal connection pool via itsTransport, creating one per call defeats connection reuse. Extract a shared package-level client with the desired timeout.♻️ Suggested refactor
+var httpClient = &http.Client{Timeout: 30 * time.Second} + func detectRegistryAuth(registryDomain, repository string) (*RegistryInfo, error) { targetURL := fmt.Sprintf("https://%s/v2/", registryDomain) - client := &http.Client{ - Timeout: 30 * time.Second, - } - resp, err := client.Get(targetURL) + resp, err := httpClient.Get(targetURL)Apply the same change in
getBearerTokenandbuildModifyResponse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 96 - 101, detectRegistryAuth, getBearerToken and buildModifyResponse each allocate a new http.Client which prevents connection reuse; instead declare a package-level shared *http.Client with the desired Timeout and reuse it in those functions. Replace local client := &http.Client{Timeout: ...} in detectRegistryAuth, getBearerToken and buildModifyResponse with references to the shared client (e.g. reuseHttpClient or defaultHTTPClient), initialize that variable once (with Timeout set) at package scope (or in init()) and ensure all HTTP calls use it so the Transport/connection pool is reused.
64-76:extractParammay match substrings of other parameter names.If the header contains a parameter whose name ends with the search term (e.g., a hypothetical
"somerealm"), thestrings.Indexapproach would produce a false match. In practice, standardWWW-Authenticateheaders use well-defined parameter names (realm,service,scope), so this is low risk—but a more robust approach would prefix-match with a delimiter (comma/space).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 64 - 76, The extractParam function can false-match parameter names that end with the search term; update extractParam to only match param when it's a full name by checking the character before the match is either start-of-string or a delimiter (comma, space) and that the match is followed by =" (i.e., look for patterns like 'param="' with a proper delimiter prefix), then extract the quoted value as before; reference the extractParam(header, param string) function and ensure you still handle missing end-quote gracefully.
55-57: Registry cache grows unboundedly — no TTL or size limit.
registryCacheis a plain map with no eviction. In long-running deployments serving many distinct registry/repository combinations, this will grow indefinitely. Consider adding a TTL-based expiry (e.g., using a simple timestamp check) or switching to an LRU cache. A TTL also ensures that auth method changes at the registry are eventually picked up.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 55 - 57, The registryCache map currently grows unbounded; modify the caching logic around registryCache/registryCacheMutex and RegistryInfo to enforce eviction—either add a lastChecked/expiry timestamp field to RegistryInfo and on Get/Set check and remove expired entries (TTL-based), or replace registryCache with an LRU structure (e.g., container/list + map) to cap size and evict least-recently-used entries; ensure all accesses still use registryCacheMutex for concurrency and that on cache miss or expired entry you re-probe and update the cache with a refreshed timestamp.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/serverbootconfiguration_pxe_controller.go`:
- Around line 226-230: The code in getLayerDigestsFromNestedManifest currently
builds imageRef with fmt.Sprintf("%s:%s", imageName, imageVersion) which breaks
when imageVersion is a digest (must use '@' for canonical digests) and also
calls r.RegistryValidator.ValidateImageRegistry without checking
r.RegistryValidator for nil; update getLayerDigestsFromNestedManifest to detect
whether imageVersion is a digest (e.g., strings.HasPrefix(imageVersion,
"sha256:") or using a ParseImageReference helper) and build imageRef using
"%s@%s" for digests and "%s:%s" for tags, and guard the validator call by first
checking if r.RegistryValidator != nil before invoking ValidateImageRegistry
(return a clear error if nil); apply the same pattern to the analogous function
in serverbootconfiguration_http_controller.go that performs registry validation.
In `@internal/registry/validation.go`:
- Around line 49-55: Normalize the input to lowercase before doing the switch
comparison in normalizeDockerHubDomain: call strings.ToLower on the incoming
domain at the start of the function, then switch on that lowercase value and
return the normalized "docker.io" as before; also apply the same lowercase
normalization to the other domain/registry normalization helper in this file
(the other function that normalizes registry domains) so all policy checks use
case-insensitive matching.
In `@server/imageproxyserver.go`:
- Around line 283-284: The code ignores url.Parse errors when creating proxyURL
from targetURL, which can yield a nil proxyURL and cause a panic in
buildDirector(proxyURL, ...) and proxy.ServeHTTP; update the call that sets
proxyURL (currently "proxyURL, _ := url.Parse(targetURL)") to handle the error:
call url.Parse and if it returns a non-nil error, log or return an HTTP error
(or propagate) and avoid calling buildDirector/proxy.ServeHTTP with a nil URL;
ensure you reference and check proxyURL and the error before using buildDirector
and proxy.ServeHTTP so malformed registryDomain/targetURL cannot trigger a
nil-pointer panic.
- Line 310: The bearerToken parameter on buildModifyResponse is now unused;
remove it from the function signature and all call sites to eliminate dead code.
Update the function declaration buildModifyResponse(...) to take no parameters
and adjust where it's invoked (e.g., the call at the former line 288) to stop
passing bearerToken; ensure any closures or referenced variables inside
buildModifyResponse (if any) still compile after the removal. Run a quick
build/test to confirm no remaining references to bearerToken in
buildModifyResponse or its callers.
---
Outside diff comments:
In `@server/imageproxyserver.go`:
- Around line 361-369: In replaceResponse, current header copying uses
originalResp.Header.Set(name, value) inside the loop which overwrites previous
values and drops multi-valued headers; change the header copy to use
originalResp.Header.Add(name, value) when iterating redirectResp.Header values
so all header entries are preserved, while still assigning originalResp.Body =
redirectResp.Body and originalResp.StatusCode = redirectResp.StatusCode as
before.
- Around line 321-326: The redirect handling currently copies all headers from
resp.Request to redirectReq via copyHeaders, leaking Authorization set by
buildDirector to third-party redirect targets; modify the logic around redirect
handling (the block that creates redirectReq and calls copyHeaders) to detect
cross-host redirects by comparing resp.Request.URL.Host (or resp.Request.Host)
with redirectReq.URL.Host (or the parsed location.Host) and, when they differ,
remove or avoid copying the "Authorization" header (and other sensitive auth
headers if present) before making the redirected request; keep Authorization
intact for same-host redirects.
---
Duplicate comments:
In `@internal/controller/serverbootconfiguration_http_controller.go`:
- Around line 205-209: In getUKIDigestFromNestedManifest, avoid reconstructing
imageRef as "name:version" that can drop an `@digest` and risk nil deref; instead
detect and preserve an existing digest (image@sha256:...) when building
imageRef, and add a nil-guard / fail-closed check after any registry manifest
lookup so you return an error if the manifest or digest is nil. Update logic
around RegistryValidator.ValidateImageRegistry and subsequent manifest handling
to prefer the `@digest` form and propagate errors when digest/manifest is missing
(mirror the fix applied in the PXE digest/validator path).
---
Nitpick comments:
In `@internal/controller/serverbootconfig_helpers.go`:
- Around line 63-77: FindManifestByArchitecture currently unmarshals
manifestData into ocispec.Manifest before checking desc.MediaType, causing a
wasted unmarshal when the descriptor is an index; change the logic to first
check desc.MediaType against ocispec.MediaTypeImageIndex (using the existing
manifestData fetched by fetchContent), and only unmarshal into ocispec.Index
when it is an index and into ocispec.Manifest when it is not; update subsequent
uses of the unmarshaled value to refer to the correctly typed variable
(ocispec.Index or ocispec.Manifest) instead of doing a double-unmarshal.
- Around line 194-198: The defer that closes reader uses fmt.Printf which breaks
structured logging; replace that printf with the project's structured logger
(e.g., use slog or the function's context/logger parameter) or at minimum write
to stderr—capture cerr from reader.Close() and log it via the structured logger
(e.g., slog.Error("failed to close reader", "err", cerr)) inside the existing
defer anonymous function that checks cerr; alternatively, if close errors are
benign here, simply drop the log call.
- Around line 155-159: The call log.Error(nil, "Failed to decode object into
Secret", "object", secret) is passing nil as the error; change it so the logging
follows logr conventions: either construct a real error (e.g., fmt.Errorf with
the unexpected type) and pass it as the first argument to log.Error, or switch
to log.Info with the same message and context. Update the call around
secret/secretObj handling (the block that does secretObj, ok :=
secret.(*corev1.Secret)) to use one of these two approaches and include the
offending value (secret) in the structured context.
In `@internal/registry/validation_test.go`:
- Around line 68-149: Add a test in TestIsRegistryAllowed to verify case-variant
matching for block-list behavior: insert a test case into the tests slice (e.g.
name "blocked with block list - DOCKER.IO", registry "DOCKER.IO", blockList
"docker.io,registry-1.docker.io", allowList "", want false) so the test asserts
that registry matching is case-insensitive for Registry validation; modify only
the tests slice in TestIsRegistryAllowed and keep the rest of the table-driven
test intact.
In `@server/imageproxyserver.go`:
- Around line 96-101: detectRegistryAuth, getBearerToken and buildModifyResponse
each allocate a new http.Client which prevents connection reuse; instead declare
a package-level shared *http.Client with the desired Timeout and reuse it in
those functions. Replace local client := &http.Client{Timeout: ...} in
detectRegistryAuth, getBearerToken and buildModifyResponse with references to
the shared client (e.g. reuseHttpClient or defaultHTTPClient), initialize that
variable once (with Timeout set) at package scope (or in init()) and ensure all
HTTP calls use it so the Transport/connection pool is reused.
- Around line 64-76: The extractParam function can false-match parameter names
that end with the search term; update extractParam to only match param when it's
a full name by checking the character before the match is either start-of-string
or a delimiter (comma, space) and that the match is followed by =" (i.e., look
for patterns like 'param="' with a proper delimiter prefix), then extract the
quoted value as before; reference the extractParam(header, param string)
function and ensure you still handle missing end-quote gracefully.
- Around line 55-57: The registryCache map currently grows unbounded; modify the
caching logic around registryCache/registryCacheMutex and RegistryInfo to
enforce eviction—either add a lastChecked/expiry timestamp field to RegistryInfo
and on Get/Set check and remove expired entries (TTL-based), or replace
registryCache with an LRU structure (e.g., container/list + map) to cap size and
evict least-recently-used entries; ensure all accesses still use
registryCacheMutex for concurrency and that on cache miss or expired entry you
re-probe and update the cache with a refreshed timestamp.
In `@test/registry/registry.go`:
- Around line 141-145: The MockRegistry push helpers (e.g.,
MockRegistry.PushPXEImage) accept an architecture parameter but never use it;
either thread the architecture into the mock manifest/index creation or
explicitly mark it unused. Fix by updating MockRegistry.PushPXEImage (and the
other mock push helpers around the same area) to pass architecture into
pushPXEManifest (or alter pushPXEManifest to accept an architecture and set the
manifest/index Platform.Architecture accordingly) so test fixtures reflect the
platform, or if you prefer no behavioral change, explicitly consume the param
(e.g., _ = architecture) in each method to signal intentional omission.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
cmd/main.gogo.modinternal/controller/serverbootconfig_helpers.gointernal/controller/serverbootconfiguration_http_controller.gointernal/controller/serverbootconfiguration_pxe_controller.gointernal/controller/suite_test.gointernal/registry/validation.gointernal/registry/validation_test.goserver/imageproxyserver.gotest/registry/registry.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/imageproxyserver.go (1)
311-333:⚠️ Potential issue | 🟠 MajorAuthorization header leaked to redirect targets.
When a registry returns a 307 redirect (commonly to cloud storage like S3/GCS/Azure Blob),
copyHeaderson line 325 forwards all original request headers — includingAuthorization: Bearer <token>— to the storage backend. Standard HTTP clients strip theAuthorizationheader on cross-origin redirects for this reason.This leaks registry credentials to third-party storage providers.
🔒 Proposed fix: strip Authorization before following redirect
redirectReq, err := http.NewRequest("GET", location.String(), nil) if err != nil { return err } copyHeaders(resp.Request.Header, redirectReq.Header) + // Strip auth header on cross-origin redirect to prevent credential leakage + if redirectReq.URL.Host != resp.Request.URL.Host { + redirectReq.Header.Del("Authorization") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 311 - 333, The redirect handler forwards all original headers via copyHeaders, leaking credentials; after creating redirectReq and calling copyHeaders(resp.Request.Header, redirectReq.Header) in the anonymous redirect function, explicitly remove sensitive auth headers (e.g., redirectReq.Header.Del("Authorization") and redirectReq.Header.Del("Proxy-Authorization")) before client.Do, so that replaceResponse and copyHeaders remain unchanged but no credentials are sent to the redirect target.
♻️ Duplicate comments (1)
server/imageproxyserver.go (1)
283-284: Ignored error fromurl.Parsecan cause a nil-pointer panic.
proxyURL, _ := url.Parse(targetURL)silently discards the error. If parsing fails,proxyURLisniland the subsequentbuildDirector/proxy.ServeHTTPcalls will panic.🐛 Proposed fix
- proxyURL, _ := url.Parse(targetURL) + proxyURL, err := url.Parse(targetURL) + if err != nil { + http.Error(w, "Internal Server Error", http.StatusInternalServerError) + log.Error(err, "Failed to parse target URL", "targetURL", targetURL) + return + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 283 - 284, The code currently ignores the error returned by url.Parse when creating proxyURL from targetURL, which can leave proxyURL nil and cause a panic in buildDirector / proxy.ServeHTTP; update the block around targetURL and url.Parse to capture the error, check if err != nil, log or write an appropriate HTTP error response (or return) and avoid calling buildDirector or proxy.ServeHTTP with a nil proxyURL; reference the variables and functions targetURL, url.Parse, proxyURL, buildDirector, and proxy.ServeHTTP when making the change.
🧹 Nitpick comments (6)
server/imageproxyserver.go (4)
185-215:RunImageProxyServeruseshttp.DefaultServeMux, which is a global.Using
http.HandleFuncregisters on the default mux. If any other code in the process also registers handlers (e.g., health checks, metrics), they'll share the same mux, which can cause routing conflicts. Consider using a dedicatedhttp.ServeMux:♻️ Proposed refactor
func RunImageProxyServer(imageProxyServerAddr string, k8sClient client.Client, validator *registry.Validator, log logr.Logger) { + mux := http.NewServeMux() - http.HandleFunc("/image", func(w http.ResponseWriter, r *http.Request) { + mux.HandleFunc("/image", func(w http.ResponseWriter, r *http.Request) { // ... }) - http.HandleFunc("/httpboot/", func(w http.ResponseWriter, r *http.Request) { + mux.HandleFunc("/httpboot/", func(w http.ResponseWriter, r *http.Request) { // ... }) log.Info("Starting image proxy server", "address", imageProxyServerAddr) - if err := http.ListenAndServe(imageProxyServerAddr, nil); err != nil { + if err := http.ListenAndServe(imageProxyServerAddr, mux); err != nil { // ... } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 185 - 215, RunImageProxyServer currently registers routes with the global http.DefaultServeMux via http.HandleFunc which can lead to conflicts; instead create a dedicated http.ServeMux inside RunImageProxyServer, register the "/image" and "/httpboot/" handlers on that mux (replacing http.HandleFunc calls), and pass the mux to http.ListenAndServe(imageProxyServerAddr, mux) so the server uses the local ServeMux and avoids interfering with other global handlers.
96-106: Consider sharing a singlehttp.Clientinstance across functions.Three separate
http.Clientinstances are created (lines 99, 163, 318) with the same 30s timeout. Allocating per-call prevents connection reuse and increases GC pressure. A package-level shared client would be more efficient.♻️ Proposed refactor
+var httpClient = &http.Client{Timeout: 30 * time.Second} + func detectRegistryAuth(registryDomain, repository string) (*RegistryInfo, error) { targetURL := fmt.Sprintf("https://%s/v2/", registryDomain) - client := &http.Client{ - Timeout: 30 * time.Second, - } - resp, err := client.Get(targetURL) + resp, err := httpClient.Get(targetURL)Apply the same change to
getBearerTokenandbuildModifyResponse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 96 - 106, The code creates separate http.Client instances in detectRegistryAuth (client := &http.Client{Timeout: 30 * time.Second}) and similarly in getBearerToken and buildModifyResponse; replace these per-call clients with a single package-level reusable http.Client (e.g., httpClient) configured with the 30s Timeout (and optionally a shared Transport to enable keep-alives) and update detectRegistryAuth, getBearerToken, and buildModifyResponse to use that shared httpClient to allow connection reuse and reduce allocations/GC pressure.
55-57: Unbounded registry cache with no TTL or eviction.
registryCachegrows indefinitely. For a long-running server handling many distinct image references, this leaks memory. Additionally, cachedTokenURLvalues become permanently stale if a registry's auth configuration changes.Consider adding a TTL-based eviction (e.g., using a simple timestamp check or a library like
patrickmn/go-cache), or at minimum cap the cache size.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 55 - 57, The registryCache map (and registryCacheMutex) is unbounded and leaks memory and can serve stale TokenURL values; update the caching to include TTL-based eviction or a max size: add a timestamp field (e.g., LastSeen or ExpiresAt) to RegistryInfo, check that timestamp in every lookup path (where registryCache is read) and treat expired entries as missing so they are refreshed, and implement either a simple cleanup goroutine that periodically removes expired entries from registryCache or replace the map with a TTL cache (e.g., patrickmn/go-cache) and use that in place of registryCache while still protecting concurrent access with registryCacheMutex if required. Ensure TokenURL is refreshed on cache miss/expiry and that additions set the timestamp/TTL when storing in registryCache.
65-76:extractParamdoes not handle edge cases in WWW-Authenticate parsing.If the header contains multiple parameters with overlapping names (e.g.,
realmappearing in a value string), or uses single quotes, or has URL-encoded quotes inside values, this parser could extract incorrect values. For the common case of well-behaved OCI registries, this works fine, but be aware of the fragility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 65 - 76, The extractParam function is fragile: it naively searches for param+"=\"" and can be fooled by the param name appearing inside a quoted value, single quotes, escaped/URL-encoded quotes, or overlapping names. Replace the simple index-based logic in extractParam with a robust token parser that scans the header left-to-right, splits parameters on unquoted commas, recognizes both double- and single-quoted values, handles escaped quotes inside quoted values (and URL-decoded contents if necessary), and returns the matching parameter's unescaped value (first match). Ensure the parser ignores occurrences of the param name inside quoted values and supports names that are substrings of other names. Use the function name extractParam to locate and update the implementation.internal/controller/serverbootconfig_helpers.go (2)
203-207: Use structured logging instead offmt.Printffor the close error.The rest of the codebase uses
logr-based structured logging.fmt.Printfhere is inconsistent and the output won't be captured by the logging framework.Since
fetchContentdoesn't receive a logger, consider either passing a context-derived logger, or simply ignoring the close error (common for read-only streams):♻️ Simpler alternative: silently discard close error
defer func() { - if cerr := reader.Close(); cerr != nil { - fmt.Printf("failed to close reader: %v\n", cerr) - } + _ = reader.Close() }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfig_helpers.go` around lines 203 - 207, The defer currently prints close errors with fmt.Printf in fetchContent (defer calling reader.Close()), which is inconsistent with the logr-based logging; replace that fmt.Printf by either (a) removing/logically ignoring the close error for read-only streams (i.e., drop the cerr check and just call reader.Close()), or (b) if you prefer to surface the error, accept a context-derived logger or a logr.Logger into fetchContent and call logger.Error(cerr, "failed to close reader") instead; update the defer around reader.Close() and any function signature changes accordingly.
72-134:FindManifestByArchitecturecleanly consolidates manifest resolution logic.Good extraction of previously duplicated manifest traversal code from both PXE and HTTP controllers. The CNAME compatibility path is gated behind
enableCNAMECompat, keeping HTTP and PXE behavior distinct.One observation: the first
json.Unmarshalintoocispec.Manifest(line 79) for index descriptors will succeed but produce a partially meaningless struct (since index JSON doesn't match manifest schema). This is harmless because the result is discarded whendesc.MediaType == ocispec.MediaTypeImageIndex, but a brief comment would clarify intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/serverbootconfig_helpers.go` around lines 72 - 134, Add a short clarifying comment in FindManifestByArchitecture explaining why we first unmarshal manifestData into an ocispec.Manifest even when desc.MediaType may be ocispec.MediaTypeImageIndex: note that the initial json.Unmarshal into the variable manifest is safe and may yield a partial/unused struct for index payloads, and that the real index parsing happens later into indexManifest when desc.MediaType == ocispec.MediaTypeImageIndex; place the comment immediately above the first json.Unmarshal call (the manifest variable) so future readers understand the intentional, harmless extra unmarshal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/serverbootconfiguration_pxe_controller.go`:
- Around line 219-221: The URL query values for kernelURL, initrdURL, and
squashFSURL are built by interpolating imageName and imageVersion (and the
digests) without escaping, which can produce malformed URLs; update the
construction to escape query parameter values (e.g., use url.QueryEscape or
build the query with url.Values / url.URL.Query()) so r.IPXEServiceURL remains
the base and imageName, imageVersion, kernelDigest, initrdDigest, squashFSDigest
are added as properly encoded query parameters when creating kernelURL,
initrdURL and squashFSURL.
---
Outside diff comments:
In `@server/imageproxyserver.go`:
- Around line 311-333: The redirect handler forwards all original headers via
copyHeaders, leaking credentials; after creating redirectReq and calling
copyHeaders(resp.Request.Header, redirectReq.Header) in the anonymous redirect
function, explicitly remove sensitive auth headers (e.g.,
redirectReq.Header.Del("Authorization") and
redirectReq.Header.Del("Proxy-Authorization")) before client.Do, so that
replaceResponse and copyHeaders remain unchanged but no credentials are sent to
the redirect target.
---
Duplicate comments:
In `@server/imageproxyserver.go`:
- Around line 283-284: The code currently ignores the error returned by
url.Parse when creating proxyURL from targetURL, which can leave proxyURL nil
and cause a panic in buildDirector / proxy.ServeHTTP; update the block around
targetURL and url.Parse to capture the error, check if err != nil, log or write
an appropriate HTTP error response (or return) and avoid calling buildDirector
or proxy.ServeHTTP with a nil proxyURL; reference the variables and functions
targetURL, url.Parse, proxyURL, buildDirector, and proxy.ServeHTTP when making
the change.
---
Nitpick comments:
In `@internal/controller/serverbootconfig_helpers.go`:
- Around line 203-207: The defer currently prints close errors with fmt.Printf
in fetchContent (defer calling reader.Close()), which is inconsistent with the
logr-based logging; replace that fmt.Printf by either (a) removing/logically
ignoring the close error for read-only streams (i.e., drop the cerr check and
just call reader.Close()), or (b) if you prefer to surface the error, accept a
context-derived logger or a logr.Logger into fetchContent and call
logger.Error(cerr, "failed to close reader") instead; update the defer around
reader.Close() and any function signature changes accordingly.
- Around line 72-134: Add a short clarifying comment in
FindManifestByArchitecture explaining why we first unmarshal manifestData into
an ocispec.Manifest even when desc.MediaType may be ocispec.MediaTypeImageIndex:
note that the initial json.Unmarshal into the variable manifest is safe and may
yield a partial/unused struct for index payloads, and that the real index
parsing happens later into indexManifest when desc.MediaType ==
ocispec.MediaTypeImageIndex; place the comment immediately above the first
json.Unmarshal call (the manifest variable) so future readers understand the
intentional, harmless extra unmarshal.
In `@server/imageproxyserver.go`:
- Around line 185-215: RunImageProxyServer currently registers routes with the
global http.DefaultServeMux via http.HandleFunc which can lead to conflicts;
instead create a dedicated http.ServeMux inside RunImageProxyServer, register
the "/image" and "/httpboot/" handlers on that mux (replacing http.HandleFunc
calls), and pass the mux to http.ListenAndServe(imageProxyServerAddr, mux) so
the server uses the local ServeMux and avoids interfering with other global
handlers.
- Around line 96-106: The code creates separate http.Client instances in
detectRegistryAuth (client := &http.Client{Timeout: 30 * time.Second}) and
similarly in getBearerToken and buildModifyResponse; replace these per-call
clients with a single package-level reusable http.Client (e.g., httpClient)
configured with the 30s Timeout (and optionally a shared Transport to enable
keep-alives) and update detectRegistryAuth, getBearerToken, and
buildModifyResponse to use that shared httpClient to allow connection reuse and
reduce allocations/GC pressure.
- Around line 55-57: The registryCache map (and registryCacheMutex) is unbounded
and leaks memory and can serve stale TokenURL values; update the caching to
include TTL-based eviction or a max size: add a timestamp field (e.g., LastSeen
or ExpiresAt) to RegistryInfo, check that timestamp in every lookup path (where
registryCache is read) and treat expired entries as missing so they are
refreshed, and implement either a simple cleanup goroutine that periodically
removes expired entries from registryCache or replace the map with a TTL cache
(e.g., patrickmn/go-cache) and use that in place of registryCache while still
protecting concurrent access with registryCacheMutex if required. Ensure
TokenURL is refreshed on cache miss/expiry and that additions set the
timestamp/TTL when storing in registryCache.
- Around line 65-76: The extractParam function is fragile: it naively searches
for param+"=\"" and can be fooled by the param name appearing inside a quoted
value, single quotes, escaped/URL-encoded quotes, or overlapping names. Replace
the simple index-based logic in extractParam with a robust token parser that
scans the header left-to-right, splits parameters on unquoted commas, recognizes
both double- and single-quoted values, handles escaped quotes inside quoted
values (and URL-decoded contents if necessary), and returns the matching
parameter's unescaped value (first match). Ensure the parser ignores occurrences
of the param name inside quoted values and supports names that are substrings of
other names. Use the function name extractParam to locate and update the
implementation.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
internal/controller/serverbootconfig_helpers.gointernal/controller/serverbootconfig_helpers_test.gointernal/controller/serverbootconfiguration_http_controller.gointernal/controller/serverbootconfiguration_pxe_controller.gointernal/registry/validation.gointernal/registry/validation_test.goserver/imageproxyserver.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/registry/validation.go
- internal/registry/validation_test.go
| kernelURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, kernelDigest) | ||
| initrdURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, initrdDigest) | ||
| squashFSURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, squashFSDigest) |
There was a problem hiding this comment.
URL query parameter values are not escaped.
imageName and imageVersion are interpolated directly into the URL string. If imageName contains characters like @ (for digest refs passed through) or other special characters, the resulting URL may be malformed or misinterpreted by the image proxy server.
Use url.QueryEscape or build the URL with url.Values:
🛠️ Proposed fix using url.Values
+ params := url.Values{}
+ params.Set("imageName", imageName)
+ params.Set("version", imageVersion)
+
- kernelURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, kernelDigest)
- initrdURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, initrdDigest)
- squashFSURL := fmt.Sprintf("%s/image?imageName=%s&version=%s&layerDigest=%s", r.IPXEServiceURL, imageName, imageVersion, squashFSDigest)
+ params.Set("layerDigest", kernelDigest)
+ kernelURL := fmt.Sprintf("%s/image?%s", r.IPXEServiceURL, params.Encode())
+ params.Set("layerDigest", initrdDigest)
+ initrdURL := fmt.Sprintf("%s/image?%s", r.IPXEServiceURL, params.Encode())
+ params.Set("layerDigest", squashFSDigest)
+ squashFSURL := fmt.Sprintf("%s/image?%s", r.IPXEServiceURL, params.Encode())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/serverbootconfiguration_pxe_controller.go` around lines
219 - 221, The URL query values for kernelURL, initrdURL, and squashFSURL are
built by interpolating imageName and imageVersion (and the digests) without
escaping, which can produce malformed URLs; update the construction to escape
query parameter values (e.g., use url.QueryEscape or build the query with
url.Values / url.URL.Query()) so r.IPXEServiceURL remains the base and
imageName, imageVersion, kernelDigest, initrdDigest, squashFSDigest are added as
properly encoded query parameters when creating kernelURL, initrdURL and
squashFSURL.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/imageproxyserver.go (1)
408-417:⚠️ Potential issue | 🟠 MajorClose original response body before replacing it on redirect.
At Line 416,
originalResp.Bodyis overwritten without closing the prior body. Under frequent redirects, this can leak connections and degrade proxy throughput.🛠️ Proposed fix
func replaceResponse(originalResp, redirectResp *http.Response) { + if originalResp.Body != nil { + _ = originalResp.Body.Close() + } + // Preserve all values for multi-valued headers (e.g., Set-Cookie) for name, values := range redirectResp.Header { originalResp.Header.Del(name) // Clear existing values first for _, value := range values { originalResp.Header.Add(name, value) // Add all values } } originalResp.Body = redirectResp.Body originalResp.StatusCode = redirectResp.StatusCode + originalResp.Status = redirectResp.Status + originalResp.ContentLength = redirectResp.ContentLength }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 408 - 417, In replaceResponse, close the previous response body to avoid connection leaks by checking if originalResp.Body != nil and calling Close() on it (ignore or handle the error as appropriate) before assigning originalResp.Body = redirectResp.Body; ensure you only close originalResp.Body and do not close redirectResp.Body so the new body remains usable, and then proceed to copy headers and set originalResp.StatusCode as currently done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/imageproxyserver.go`:
- Around line 43-45: Update TokenResponse to include both Token and AccessToken
fields (e.g., TokenResponse { Token string `json:"token"`; AccessToken string
`json:"access_token"` }) and change token handling to prefer AccessToken when
set, falling back to Token. After fetching the token HTTP response (the code
path that currently unmarshals into TokenResponse), first check the response
status (only unmarshal on 2xx success); if it's not a success, read the response
body and return or log a clear error including the status code and body instead
of attempting to unmarshal. Ensure the token consumer (where the token string is
used) reads the chosen field (AccessToken || Token).
- Around line 71-73: extractRepository currently only trims the registry prefix
and therefore preserves tag (':tag') and digest ('@sha...') suffixes and fails
to map single-component Docker Hub refs to library/<name>. Fix extractRepository
to: 1) remove any digest by splitting on '@' and taking the left part; 2) remove
a tag by splitting on the last ':' only if that ':' appears after the last '/'
(so you don't strip registry ports); and 3) if the registryDomain corresponds to
Docker Hub (e.g., "docker.io" or the implicit hub case) and the resulting
repository has no '/', prefix it with "library/". Update the extractRepository
function and any other places that replicate this logic (the other extraction
sites mentioned in your review) so they use the corrected behavior.
- Around line 131-133: The code currently checks the WWW-Authenticate header
with a case-sensitive prefix (strings.HasPrefix(authHeader, "Bearer ")), causing
lowercase or mixed-case schemes to be missed; update the check to be
case-insensitive (e.g., compare strings.HasPrefix(strings.ToLower(authHeader),
"bearer ")), then set info.AuthMethod = AuthBearer and call
extractTokenURL(authHeader, repository) as before so token parsing still uses
the original header value.
---
Outside diff comments:
In `@server/imageproxyserver.go`:
- Around line 408-417: In replaceResponse, close the previous response body to
avoid connection leaks by checking if originalResp.Body != nil and calling
Close() on it (ignore or handle the error as appropriate) before assigning
originalResp.Body = redirectResp.Body; ensure you only close originalResp.Body
and do not close redirectResp.Body so the new body remains usable, and then
proceed to copy headers and set originalResp.StatusCode as currently done.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
server/imageproxyserver.go (3)
251-254: Consider defensive nil check on required validator parameter.If
validatorisnil, the server will panic whenvalidator.IsRegistryAllowed()is called at Line 325. While the caller should always provide a valid validator, a defensive check with a clear error message would aid debugging.🛡️ Optional: Defensive nil check
func RunImageProxyServer(imageProxyServerAddr string, k8sClient client.Client, validator *registry.Validator, log logr.Logger) { + if validator == nil { + log.Error(nil, "Registry validator is required but was nil") + panic("registry validator is required") + } + // Start background cleanup of expired cache entries go cleanupExpiredCacheEntries(log)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 251 - 254, RunImageProxyServer currently assumes validator is non-nil and will panic when calling validator.IsRegistryAllowed; add a defensive nil check at the start of RunImageProxyServer that detects if validator == nil and logs a clear error (using the provided log) and exits/returns early (or panic with a clear message) so subsequent calls like validator.IsRegistryAllowed do not dereference nil. Target the RunImageProxyServer function and the validator parameter for this change.
86-98: Parameter name matching is case-sensitive; RFC 7235 specifies case-insensitivity.The
extractParamfunction matches parameter names exactly (e.g., "realm"). Per RFC 7235, auth-param names are case-insensitive, soRealmorREALMwould be missed. In practice, OCI registries use lowercase, so this is low risk.♻️ Optional: Case-insensitive parameter extraction
func extractParam(header, param string) string { - start := strings.Index(header, param+"=\"") + lowerHeader := strings.ToLower(header) + lowerParam := strings.ToLower(param) + start := strings.Index(lowerHeader, lowerParam+"=\"") if start == -1 { return "" } - start += len(param) + 2 - end := strings.Index(header[start:], "\"") + start += len(lowerParam) + 2 + end := strings.Index(lowerHeader[start:], "\"") if end == -1 { return "" } + // Return value from original header to preserve case return header[start : start+end] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 86 - 98, The extractParam function currently matches parameter names case-sensitively; update extractParam(header, param string) to perform a case-insensitive match (per RFC 7235) when locating param+"=\"" in the WWW-Authenticate header — e.g., operate on a lowercase copy of the header or use a case-insensitive search/regex (or strings.EqualFold when comparing tokens) so names like "Realm" or "REALM" are recognized, then extract the value from the original header slice accordingly and return the parameter value as before.
222-249: Consider adding a cache size limit for production hardening.The cache cleanup only removes expired entries but doesn't limit total cache size. With many unique registry/repository combinations, the cache could grow unbounded. For most deployments this is fine, but at scale an LRU eviction policy or max-size limit could be beneficial.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 222 - 249, cleanupExpiredCacheEntries only prunes by expiration and doesn't prevent unbounded growth; add a max-size policy (e.g., const maxRegistryCacheEntries) and enforce it when mutating registryCache: when len(registryCache) exceeds the limit, evict entries until within limit using either the oldest-by-expiresAt heuristic or an LRU structure (maintain a doubly-linked list or timestamp on each registryCache entry). Update places that insert into registryCache (the code that sets registryCache[key] = entry) to acquire registryCacheMutex, check len(registryCache) > maxRegistryCacheEntries and perform eviction logic (remove keys from registryCache and update any LRU metadata), and keep cleanupExpiredCacheEntries as a periodic expiration pass that also logs evictions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/imageproxyserver.go`:
- Around line 398-401: The redirect handling creates redirectReq with
http.NewRequest which uses background context and won't be cancelled when the
client disconnects; change the creation to use http.NewRequestWithContext and
pass the original incoming request's context (e.g., r.Context()) so the
redirectReq (the variable named redirectReq created with location.String())
inherits cancellation/timeout from the original request context.
- Around line 195-199: The error handling uses io.ReadAll(resp.Body) without
bounds, which can OOM on large responses; change the read to use a limited
reader (e.g., io.ReadAll(io.LimitReader(resp.Body, maxErrorBodySize))) or an
equivalent copy with io.CopyN and choose a sensible constant (e.g.,
maxErrorBodySize = 64*1024) near the top of the file; update the error message
to indicate the body may be truncated and reference the existing resp and
io.ReadAll usage in the token request handling (the block that checks
resp.StatusCode != http.StatusOK) so you only replace that read with a bounded
read.
---
Nitpick comments:
In `@server/imageproxyserver.go`:
- Around line 251-254: RunImageProxyServer currently assumes validator is
non-nil and will panic when calling validator.IsRegistryAllowed; add a defensive
nil check at the start of RunImageProxyServer that detects if validator == nil
and logs a clear error (using the provided log) and exits/returns early (or
panic with a clear message) so subsequent calls like validator.IsRegistryAllowed
do not dereference nil. Target the RunImageProxyServer function and the
validator parameter for this change.
- Around line 86-98: The extractParam function currently matches parameter names
case-sensitively; update extractParam(header, param string) to perform a
case-insensitive match (per RFC 7235) when locating param+"=\"" in the
WWW-Authenticate header — e.g., operate on a lowercase copy of the header or use
a case-insensitive search/regex (or strings.EqualFold when comparing tokens) so
names like "Realm" or "REALM" are recognized, then extract the value from the
original header slice accordingly and return the parameter value as before.
- Around line 222-249: cleanupExpiredCacheEntries only prunes by expiration and
doesn't prevent unbounded growth; add a max-size policy (e.g., const
maxRegistryCacheEntries) and enforce it when mutating registryCache: when
len(registryCache) exceeds the limit, evict entries until within limit using
either the oldest-by-expiresAt heuristic or an LRU structure (maintain a
doubly-linked list or timestamp on each registryCache entry). Update places that
insert into registryCache (the code that sets registryCache[key] = entry) to
acquire registryCacheMutex, check len(registryCache) > maxRegistryCacheEntries
and perform eviction logic (remove keys from registryCache and update any LRU
metadata), and keep cleanupExpiredCacheEntries as a periodic expiration pass
that also logs evictions.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/imageproxyserver.go (1)
232-259: Consider adding a shutdown mechanism for graceful cleanup.The cleanup goroutine runs indefinitely with no way to stop it. While this works fine for a long-running server (process exit terminates it), adding a context-based cancellation would enable clean shutdown in testing and containerized environments.
♻️ Optional: Add context-based cancellation
-func cleanupExpiredCacheEntries(log logr.Logger) { +func cleanupExpiredCacheEntries(ctx context.Context, log logr.Logger) { ticker := time.NewTicker(5 * time.Minute) defer ticker.Stop() - for range ticker.C { + for { + select { + case <-ctx.Done(): + log.V(1).Info("Cache cleanup stopped") + return + case <-ticker.C: + } now := time.Now() // ... rest of cleanup logic } }Then in
RunImageProxyServer, pass a cancellable context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/imageproxyserver.go` around lines 232 - 259, The cleanup goroutine in cleanupExpiredCacheEntries has no stop mechanism; change its signature to accept a context (ctx context.Context) and make the for loop select on ctx.Done() and ticker.C so it returns and stops the ticker when cancelled; ensure registryCacheMutex is still used as before and that you stop the ticker via defer ticker.Stop(). Then update RunImageProxyServer to create/accept a cancellable context (or propagate the server context) and pass that context into cleanupExpiredCacheEntries so the cleanup goroutine can be cancelled for graceful shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/imageproxyserver.go`:
- Around line 232-259: The cleanup goroutine in cleanupExpiredCacheEntries has
no stop mechanism; change its signature to accept a context (ctx
context.Context) and make the for loop select on ctx.Done() and ticker.C so it
returns and stops the ticker when cancelled; ensure registryCacheMutex is still
used as before and that you stop the ticker via defer ticker.Stop(). Then update
RunImageProxyServer to create/accept a cancellable context (or propagate the
server context) and pass that context into cleanupExpiredCacheEntries so the
cleanup goroutine can be cancelled for graceful shutdown.
Proposed Changes
Replace registry-specific handlers (handleGHCR, handleKeppel) with a generic handleDockerRegistry() that works with any OCI-compliant registry
Add multi-layer registry validation to prevent unauthorised registry access before expensive OCI operations
Layer 1: Controller validation before image inspection (early failure with clear errors)
Layer 2: ImageProxy runtime validation (security boundary at HTTP request time)
Create shared registry validation package
Add early validation to PXE and HTTP Boot controllers before calling docker.NewResolver().Resolve() to provide clear error messages
Fix missing error state patching in HTTP Boot controller to match PXE patterns
Implement fail-closed security model - deny all registries if neither ALLOWED_REGISTRIES nor BLOCKED_REGISTRIES is configured
Fixes #275
Summary by CodeRabbit
New Features
Tests