✨ Add llm-proxy deployment #496

fabianvf · 2025-11-14T17:06:09Z

Closes #487
Depends on konveyor/tackle2-hub#953

Still TODO: need to figure out the proper way to set the ingress host etc, and create a provider-settings configmap that the hub can serve to users.

Summary by CodeRabbit

New Features
- Optional LLM proxy service with multi-provider support, deployable proxy, client config, and UI integration when enabled.
Monitoring
- Component status updates for LLM proxy readiness, API server readiness, and API-keys configured conditions.
Configuration
- Keycloak API audience added; UI exposed proxy URL when enabled; secret-driven API key envs rendered conditionally; additional DB env vars surfaced.
Chores
- Added a debug log for the Kai solution server enablement flag.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Fabian von Feilitzsch <[email protected]>

coderabbitai · 2025-11-14T17:06:18Z

Walkthrough

Adds an LLM proxy feature: new defaults and provider mappings, Ansible tasks to deploy the proxy and update Kai CR status, and multiple Kubernetes templates (ConfigMaps, Deployment, Service, client config) with Keycloak and multi-provider LLM wiring.

Changes

Cohort / File(s)	Summary
Configuration defaults `roles/tackle/defaults/main.yml`	Adds `keycloak_api_audience` and LLM-proxy defaults and mappings (`kai_llm_params_proxy`, `kai_llm_proxy_enabled`, `kai_llm_proxy_image_fqin`, `kai_llm_proxy_url`, `kai_llm_proxy_client_config_name`, `kai_llm_proxy_provider_type_map`, `kai_llm_proxy_provider_type`, `kai_llm_proxy_provider_id`, `kai_llm_proxy_api_key_env_map`, `kai_llm_proxy_api_key_env`) and provider selection logic.
Ansible tasks `roles/tackle/tasks/main.yml`, `roles/tackle/tasks/kai.yml`, `roles/tackle/tasks/kai-status.yml`	Adds a debug task for `kai_solution_server_enabled`; conditional "Deploy LLM Proxy" block (ConfigMap, Deployment, Service, client ConfigMap) gated by `kai_llm_proxy_enabled`; `kai-status.yml` queries llm-proxy and kai-api Deployments, computes readiness flags and updates CR conditions via `operator_sdk.util.k8s_status`.
Hub & UI deployment templates `roles/tackle/templates/deployment-hub.yml.j2`, `roles/tackle/templates/deployment-ui.yml.j2`	Adds `KEYCLOAK_AUDIENCE` env in hub template and conditional `KAI_LLM_PROXY_URL` env var in UI template when proxy enabled.
Kai component templates `roles/tackle/templates/kai/kai-api-deployment.yaml.j2`, `roles/tackle/templates/kai/kai-db-deployment.yaml.j2`	`kai-api` template: make secret-derived env vars conditional on secret data presence. `kai-db` template: unquoted namespace and add `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` env vars sourced from secrets.
LLM proxy resources (new templates) `roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2`, `roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2`, `roles/tackle/templates/kai/llm-proxy-service.yaml.j2`, `roles/tackle/templates/kai/llm-proxy-client-configmap.yaml.j2`	New ConfigMap (run.yaml with multi-provider configs, storage, auth, telemetry), Deployment (probes, envs, secret injections, ConfigMap checksum restart, optional TLS CA mounts), Service (ClusterIP:8321), and client ConfigMap (config.json with model). Provider-specific branches and Keycloak OAuth wiring included.

Sequence Diagram(s)

sequenceDiagram
    participant Ansible as Ansible Role
    participant K8s as Kubernetes API
    participant Operator as Operator (CR updates)

    Ansible->>K8s: create ConfigMap (llm-proxy), Deployment, Service, client ConfigMap (when enabled)
    opt ansible_operator_meta defined
        Ansible->>Operator: include kai-status.yml
        Operator->>K8s: get Deployment llm-proxy
        Operator->>K8s: get Deployment kai-api
        Operator->>Operator: compute readiness flags
        Operator->>K8s: update CR conditions (LLMProxyReady, KaiSolutionServerReady, KaiAPIKeysConfigured)
    end

sequenceDiagram
    participant Client as Client
    participant Proxy as LLM Proxy
    participant Keycloak as Keycloak
    participant Provider as LLM Provider
    participant DB as PostgreSQL

    Client->>Proxy: request + JWT
    Proxy->>Keycloak: validate token
    alt token valid
        Keycloak-->>Proxy: validated
        Proxy->>Provider: forward request (provider-specific credentials)
        Provider-->>Proxy: LLM response
        Proxy->>DB: store/track request
        Proxy-->>Client: return response
    else token invalid
        Keycloak-->>Proxy: invalid
        Proxy-->>Client: 401/deny
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (large, complex templating, many provider branches, Keycloak wiring).
- roles/tackle/tasks/kai-status.yml (readiness logic, timestamp formatting, condition updates).
- Defaults and provider-mapping logic in roles/tackle/defaults/main.yml.
- Cross-template env var/secret name consistency (kai_api_key_secret_status usage across templates).
- Deployment checksum annotation and volume/secret mount conditionals in llm-proxy-deployment template.

Suggested reviewers

jwmatthews

Poem

🐰 I stitched a config, small and spry,

A proxy hops where models lie,
Tokens checked and routes aligned,
Secrets snug, and services signed,
Hooray — Kai's new helper bounds nearby.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding LLM proxy deployment functionality. It is directly related to the changeset.
Linked Issues check	✅ Passed	Changes implement all core requirements: externally reachable proxy via Service/Deployment, LLM credential configuration via defaults/templates, and JWT/Keycloak validation setup in llm-proxy-configmap.yaml.j2.
Out of Scope Changes check	✅ Passed	All changes are scoped to LLM proxy deployment and supporting infrastructure. Minor additions like keycloak_api_audience and kai-status.yml monitoring are directly supporting the proxy deployment.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60119dd and 2fa9e78.

📒 Files selected for processing (1)

roles/tackle/templates/deployment-hub.yml.j2 (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

roles/tackle/templates/deployment-hub.yml.j2

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-operator-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

roles/tackle/tasks/main.yml (1)

914-917: Debug logging is safe; consider gating if it becomes noisy

This debug task mirrors the existing when: kai_solution_server_enabled | bool usage and doesn’t introduce new failure modes, but it will log on every run. If logs get too chatty over time, consider wrapping it in a when: ansible_verbosity|int >= 2 (or similar) so it only appears in more verbose runs.

roles/tackle/templates/kai/kai-api-deployment.yaml.j2 (1)

29-37: Good defensive guard around secret-derived env vars

The if around kai_api_key_secret_status.resources | length > 0 correctly prevents indexing resources.0 when the secret is missing, avoiding template failures while keeping behavior unchanged when it exists. If you want to tighten it further, you could also guard resources with a | default([]) to handle the (unlikely) case where kai_api_key_secret_status is unset, but that’s optional.

roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (1)

31-45: Adding POSTGRES_ aliases is reasonable; confirm image expectations*

Mirroring the existing POSTGRESQL_* vars with POSTGRES_* aliases wired to the same secret keys is a sensible way to support images/tooling expecting either naming scheme. Just confirm that the kai_database_image_fqin image actually consumes the desired variant(s) so you’re not carrying unused env vars.

roles/tackle/templates/kai/llm-proxy-ingress.yaml.j2 (1)

1-45: Ingress wiring is sound; consider refining TLS placeholder

The ingress correctly targets llm-proxy:8321, parametrizes ingressClassName, and sets reasonable nginx/alb annotations. The tls: - {} placeholder, though valid YAML, may behave inconsistently across controllers; you might prefer to either omit tls entirely when no secret/hosts are configured or model it after whatever pattern you use in the existing UI ingress template.
roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1)
62-68: Normalize feature_auth_required to a boolean in Jinja conditionals

Here you use:
{% if feature_auth_required and keycloak_sso_proto == 'https' %}
  ...
{% endif %}
and similarly for the trusted-ca volume. Elsewhere in the role, feature_auth_required is typically used as feature_auth_required | bool because it’s derived via a Jinja expression and may be a string value.

For consistency (and to avoid surprises where a string "false" still evaluates truthy), consider:
-{% if feature_auth_required and keycloak_sso_proto == 'https' %}
+{% if feature_auth_required | bool and keycloak_sso_proto == 'https' %}
...
-{% if feature_auth_required and keycloak_sso_proto == 'https' %}
+{% if feature_auth_required | bool and keycloak_sso_proto == 'https' %}
This keeps the behavior aligned with the rest of the templates and ensures the trusted CA wiring only kicks in when auth is actually required.

Also applies to: 89-115
roles/tackle/defaults/main.yml (1)
123-123: LLM proxy defaults are coherent; consider guarding kai_llm_model in computed model id

The new defaults for keycloak_api_audience, kai_llm_params_proxy, and the llm_proxy_* provider/API-key mappings line up well with the deployment and ConfigMap templates.

One small edge case: with the current definition
kai_llm_params_proxy:
  model: "{{ llm_proxy_provider_id }}/{{ kai_llm_model }}"
and kai_llm_model defaulting to null, you’ll end up with values like openai/None if someone enables the proxy without explicitly setting a model. You might want a safer default or explicit guard, for example:
-kai_llm_params_proxy:
-  model: "{{ llm_proxy_provider_id }}/{{ kai_llm_model }}"
+kai_llm_params_proxy:
+  model: >-
+    {{ (llm_proxy_provider_id ~ '/' ~ kai_llm_model)
+       if kai_llm_model else None }}
or simply document that kai_llm_model is required whenever kai_llm_proxy_enabled is set.

Also applies to: 294-351
roles/tackle/tasks/kai-status.yml (1)
22-39: Downgrade severity: This is optional refactoring, not a critical issue

The code currently works correctly due to Jinja2's reliable short-circuit evaluation of the and operator. When kai_llm_proxy_enabled is false (the default), the first part of the AND expression fails and llm_proxy_deployment.resources is never evaluated—preventing any undefined variable error.

However, the review's suggestion is valid as a defensive programming practice to make the guard explicit rather than relying on implicit short-circuit behavior. This improves code clarity and maintainability.

The suggested fix is sound, but this is a code quality improvement recommendation, not a critical bug:
- name: Set LLM proxy ready status
  when: kai_llm_proxy_enabled | bool and (llm_proxy_deployment is defined)
  set_fact:
    llm_proxy_ready: "{{ llm_proxy_deployment.resources | length > 0 and
                         llm_proxy_deployment.resources[0].status.readyReplicas | default(0) > 0 and
                         llm_proxy_deployment.resources[0].status.readyReplicas | default(0) == llm_proxy_deployment.resources[0].status.replicas | default(1) and
                         llm_proxy_deployment.resources[0].status.updatedReplicas | default(0) == llm_proxy_deployment.resources[0].status.replicas | default(1) }}"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75da850 and 7b24a47.

📒 Files selected for processing (11)

roles/tackle/defaults/main.yml (2 hunks)
roles/tackle/tasks/kai-status.yml (1 hunks)
roles/tackle/tasks/kai.yml (2 hunks)
roles/tackle/tasks/main.yml (1 hunks)
roles/tackle/templates/deployment-hub.yml.j2 (1 hunks)
roles/tackle/templates/kai/kai-api-deployment.yaml.j2 (1 hunks)
roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-ingress.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-service.yaml.j2 (1 hunks)

🔇 Additional comments (3)

roles/tackle/templates/kai/llm-proxy-service.yaml.j2 (1)

1-19: Service wiring looks correct; ensure pod labels match selector

ClusterIP service on port 8321 with selector app: llm-proxy aligns with the ingress/backend usage. Just double-check the llm-proxy Deployment labels its pods with app: llm-proxy so this selector actually picks them up.

roles/tackle/templates/deployment-hub.yml.j2 (1)

101-134: KEYCLOAK_AUDIENCE wiring looks consistent; ensure it matches Keycloak client config

Adding KEYCLOAK_AUDIENCE sourced from keycloak_api_audience under the existing Keycloak-only block is consistent with how the same variable is used in the LLM proxy config. Just make sure the value ("konveyor-api" by default) actually matches the audience/client settings in Keycloak so JWT validation is coherent across Hub, Kai, and the proxy.

roles/tackle/tasks/kai.yml (1)

149-152: Status update hook for Kai components looks correct

Importing kai-status.yml only when ansible_operator_meta is defined is a clean way to keep CR status updates operator-only while reusing the same Kai deployment tasks elsewhere.

coderabbitai · 2025-11-14T17:16:49Z

roles/tackle/tasks/kai.yml

+    - name: Get Keycloak external URL
+      when:
+        - kai_llm_proxy_enabled | bool
+        - feature_auth_required | bool
+        - keycloak_external_url is not defined
+      block:
+        - name: "Get Keycloak route on OpenShift"
+          when: openshift_cluster
+          k8s_info:
+            api_version: route.openshift.io/v1
+            kind: Route
+            name: "{{ keycloak_sso_name }}"
+            namespace: "{{ app_namespace }}"
+          register: keycloak_route
+
+        - name: "Set Keycloak external URL from route"
+          when:
+            - openshift_cluster
+            - keycloak_route.resources | length > 0
+          set_fact:
+            keycloak_external_url: "https://{{ keycloak_route.resources[0].spec.host }}"
+
+        - name: "Get Keycloak ingress on Kubernetes"
+          when: not openshift_cluster
+          k8s_info:
+            api_version: networking.k8s.io/v1
+            kind: Ingress
+            name: "{{ keycloak_sso_name }}"
+            namespace: "{{ app_namespace }}"
+          register: keycloak_ingress
+
+        - name: "Set Keycloak external URL from ingress"
+          when:
+            - not openshift_cluster
+            - keycloak_ingress.resources | length > 0
+          set_fact:
+            keycloak_external_url: "http://{{ keycloak_ingress.resources[0].spec.rules[0].host }}/auth"
+
+    - name: Deploy LLM Proxy
+      when: kai_llm_proxy_enabled | bool
+      block:
+        - name: Create LLM Proxy ConfigMap
+          k8s:
+            state: present
+            template: kai/llm-proxy-configmap.yaml.j2
+
+        - name: Create LLM Proxy Deployment
+          k8s:
+            state: present
+            template: kai/llm-proxy-deployment.yaml.j2
+
+        - name: Create LLM Proxy Service
+          k8s:
+            state: present
+            template: kai/llm-proxy-service.yaml.j2
+
+        - name: Create LLM Proxy Ingress
+          k8s:
+            state: present
+            template: kai/llm-proxy-ingress.yaml.j2
+          when: not openshift_cluster
+
+        # TODO: Determine LLM proxy external URL
+        # After creating the ingress/route, we should:
+        # 1. Query the ingress/route to get the actual external URL
+        # 2. Store it as llm_proxy_external_url
+        # 3. Make it available to users (perhaps in status or ConfigMap)
+        # This URL is needed for external clients to connect to the proxy
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Keycloak external URL discovery is good, but currently unused and path/scheme may not match real Keycloak

The route/ingress discovery logic is a solid start, but a couple of points to tighten:

keycloak_external_url isn’t used in the provided llm-proxy-configmap.yaml.j2; the proxy auth section still points at the internal service. If the JWT iss claim is based on the external URL, you’ll likely want to plumb keycloak_external_url into the LLM proxy’s issuer / jwks.uri instead of (or in addition to) the cluster-internal host.

The Kubernetes Ingress branch hardcodes http://{{ keycloak_ingress.resources[0].spec.rules[0].host }}/auth. That may not match clusters where:

TLS is terminated at the ingress (should be https://), or

the Keycloak path isn’t /auth (or is already included in the ingress path).

You might want to:

Normalize keycloak_external_url to a single base (scheme + host [+ optional base path]) and

Derive the issuer/JWKS URLs from that in the proxy config, with a fallback to the internal service if discovery fails.

🏁 Script executed:

#!/bin/bash # Find the LLM proxy configuration template fd -t f "llm-proxy" --type file | head -20

Length of output: 216

🏁 Script executed:

#!/bin/bash # Search for references to keycloak_external_url rg -n "keycloak_external_url" --type yaml --type j2

Length of output: 26

🏁 Script executed:

#!/bin/bash # Find and examine the llm-proxy-configmap template find . -name "llm-proxy-configmap.yaml.j2" -o -name "*llm*proxy*" -type f 2>/dev/null | head -20

Length of output: 224

🏁 Script executed:

#!/bin/bash # Search for keycloak references in proxy or auth config rg -n -C2 "keycloak|issuer|jwks" --type yaml --type j2 | grep -A2 -B2 "proxy\|auth\|config" | head -50

Length of output: 26

🏁 Script executed:

#!/bin/bash # Look at the kai.yml file around lines 71-139 for context cat roles/tackle/tasks/kai.yml | sed -n '60,150p'

Length of output: 2917

🏁 Script executed:

#!/bin/bash # Examine the LLM proxy configmap template cat roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

Length of output: 3765

🏁 Script executed:

#!/bin/bash # Search for keycloak_external_url in all files rg -n "keycloak_external_url" --no-type

Length of output: 435

🏁 Script executed:

#!/bin/bash # Search for keycloak references in templates (no type filter) rg -n "keycloak" roles/tackle/templates/kai/ -A2 -B2

Length of output: 2310

🏁 Script executed:

#!/bin/bash # Search for issuer and jwks patterns rg -n "issuer|jwks|iss_uri|token_endpoint" roles/tackle/templates/kai/

Length of output: 277

🏁 Script executed:

#!/bin/bash # Look at all template files to understand structure ls -la roles/tackle/templates/kai/ | grep -E "llm-proxy|kai-api"

Length of output: 453

🏁 Script executed:

#!/bin/bash # Search entire codebase for any reference to keycloak_external_url rg "keycloak_external_url"

Length of output: 320

🏁 Script executed:

#!/bin/bash # Check if keycloak_sso_service_name and keycloak_sso_port are defined rg -n "keycloak_sso_service_name|keycloak_sso_port" roles/tackle/defaults/ roles/tackle/vars/ 2>/dev/null | head -20

Length of output: 916

🏁 Script executed:

#!/bin/bash # Verify what keycloak_sso_service_name typically resolves to rg -n "keycloak_sso_service_name.*=" roles/tackle/

Length of output: 0

Keycloak external URL is discovered but unused—JWT validation will fail

The review comment is accurate. Verification confirms:

keycloak_external_url is computed in roles/tackle/tasks/kai.yml (lines ~94, ~104) but has zero references anywhere else in the codebase—it is never passed to any template.

The LLM proxy auth config (roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2, lines 113–114) hardcodes internal Kubernetes service URLs (keycloak_sso_service_name) instead of using the external URL.

If Keycloak issues JWTs with an external issuer (standard for ingress/route deployments), the proxy will reject them because the issuer claim won't match the internal service URL configured in the proxy.

The hardcoded http:// scheme (lines 104 and 113) may be incorrect if TLS terminates at the ingress.

The hardcoded /auth path (lines 104 and 113) assumes a fixed Keycloak path structure that may not hold across deployments.

Recommendation: Plumb keycloak_external_url into the proxy config template as the issuer/JWKS base, and derive the full URIs from it with a fallback to the internal service if discovery fails.

🤖 Prompt for AI Agents

In roles/tackle/tasks/kai.yml lines 71-139, keycloak_external_url is computed but never used; update the task flow to pass keycloak_external_url into the LLM Proxy ConfigMap template (e.g., add vars to the k8s template call or set a fact consumed by the template) so the proxy can use it as the issuer/JWKS base; modify the value you pass to derive full URIs from keycloak_external_url (respecting its scheme/host/path) and only fall back to the internal service name (keycloak_sso_service_name and port) if discovery is missing, and avoid hardcoding "http://" or the "/auth" path in the task—let the template construct issuer/jwks endpoints from the provided external URL with a safe fallback to internal service when needed.

roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

coderabbitai · 2025-11-14T17:16:49Z

roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

+    server:
+      port: 8321
+{% if feature_auth_required %}
+      auth:
+        provider_config:
+          type: "oauth2_token"
+          jwks:
+            uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs"
+          issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"
+          audience: "{{ keycloak_api_audience }}"
+{% endif %}
+    telemetry:
+      enabled: false


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Align OAuth2 issuer/jwks.uri with Keycloak scheme and external URL

The auth config is currently:

auth: provider_config: type: "oauth2_token" jwks: uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs" issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}" audience: "{{ keycloak_api_audience }}"

Issues:

The scheme is hardcoded to http:// even though keycloak_sso_port is 8443 when TLS is enabled. That will fail against a TLS-only Keycloak.

Both issuer and jwks.uri point at the cluster-internal service name, while Keycloak typically mints tokens with iss pointing at the external URL (route/ingress, including KEYCLOAK_FRONTEND_URL). This mismatch can cause issuer validation to fail in the LLM proxy.

At minimum, it’s safer to respect keycloak_sso_proto:

- jwks: - uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs" - issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}" + jwks: + uri: "{{ keycloak_sso_proto }}://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs" + issuer: "{{ keycloak_sso_proto }}://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"

Given you already compute keycloak_external_url in roles/tackle/tasks/kai.yml when the LLM proxy + auth are enabled, you may want to go further and base issuer/jwks.uri on that external URL (with a fallback to the internal service) so they match the iss claim seen by external clients.

I’d recommend validating this end‑to‑end with a real JWT from your Keycloak to ensure the proxy’s issuer/audience checks pass as intended.

I'll verify the review comment by examining the codebase for the relevant configuration variables and patterns.

🏁 Script executed:

#!/bin/bash # 1. Display the template file to verify current state echo "=== Template content ===" cat roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 echo -e "\n=== Searching for keycloak_sso_proto definition ===" rg -n "keycloak_sso_proto" --type yaml --type yml -C 3 echo -e "\n=== Searching for keycloak_external_url ===" rg -n "keycloak_external_url" -C 3 echo -e "\n=== Searching for keycloak_sso_port definition ===" rg -n "keycloak_sso_port" --type yaml --type yml -C 2

Length of output: 5442

I need to search without the file type filter that caused the error:

🏁 Script executed:

#!/bin/bash # Search for keycloak_sso_proto definition echo "=== Searching for keycloak_sso_proto ===" rg -n "keycloak_sso_proto" -C 3 echo -e "\n=== Searching for keycloak_sso_port ===" rg -n "keycloak_sso_port" -C 3 echo -e "\n=== Looking for variable defaults in defaults/main.yml ===" find . -name "defaults" -o -name "main.yml" | head -20 rg -n "keycloak" roles/tackle/defaults/main.yml 2>/dev/null | head -30 echo -e "\n=== Checking if keycloak_sso_proto is used elsewhere ===" rg -n "keycloak_sso_proto\|keycloak_sso_port" --type py --type yml --type yaml --type j2 | head -40

Length of output: 9740

Use keycloak_sso_proto for OAuth2 auth URLs to fix TLS mismatch

The review identifies a confirmed bug in roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 lines 113–114. The auth config hardcodes the scheme to http:// even though:

keycloak_sso_proto is defined as 'https' when TLS is enabled and 'http' when disabled

keycloak_sso_port is 8443 when TLS is enabled

This causes OAuth2 issuer validation to fail in HTTPS/TLS deployments. Additionally, keycloak_external_url is already computed in the role when LLM proxy + auth are enabled, so using it for the issuer and jwks.uri would align token validation with external client expectations.

At minimum, update lines 113–114 to use {{ keycloak_sso_proto }} instead of hardcoded http://. Ideally, base both URLs on the computed keycloak_external_url to match Keycloak's token issuer claim.

🤖 Prompt for AI Agents

roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 lines 106 to 118: the OAuth2 URLs are hardcoded with "http://" which breaks issuer/jwks validation in TLS deployments; replace the hardcoded scheme with the proper templated value and preferably use the precomputed keycloak_external_url: update jwks.uri and issuer to build URLs using {{ keycloak_sso_proto }} and {{ keycloak_sso_service_name }}, {{ keycloak_sso_port }} and {{ keycloak_sso_realm }} or, better, use {{ keycloak_external_url }} (ensuring it includes proto, host and port) so both jwks.uri and issuer match Keycloak's external issuer value.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

jortel · 2025-11-20T15:22:39Z

roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: llm-proxy-config


-config seems redundant on a configMap. Like having all the pod resources have a -pod suffix in the name. just a nit.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1)
94-100: Review resource limits for production workloads.

The current resource configuration:

Requests: 256Mi RAM, 100m CPU

Limits: 512Mi RAM, 500m CPU

These limits may be insufficient for LLM proxy operations under load. Consider:

Whether 512Mi RAM is adequate for the proxy's memory footprint

If 500m CPU (0.5 cores) provides acceptable response times

Whether these should be configurable via variables (similar to other components in the codebase)

Consider making resources configurable:

In roles/tackle/defaults/main.yml, add:
kai_llm_proxy_container_requests_cpu: "100m"
kai_llm_proxy_container_requests_memory: "256Mi"
kai_llm_proxy_container_limits_cpu: "500m"
kai_llm_proxy_container_limits_memory: "512Mi"
Then update the template:
           resources:
             requests:
-              memory: "256Mi"
-              cpu: "100m"
+              memory: "{{ kai_llm_proxy_container_requests_memory }}"
+              cpu: "{{ kai_llm_proxy_container_requests_cpu }}"
             limits:
-              memory: "512Mi"
-              cpu: "500m"
+              memory: "{{ kai_llm_proxy_container_limits_memory }}"
+              cpu: "{{ kai_llm_proxy_container_limits_cpu }}"
roles/tackle/defaults/main.yml (2)

312-319: LGTM: LLM proxy feature flags and service configuration.

The defaults provide sensible values:

kai_llm_proxy_enabled: false (opt-in)

Internal service URL with explicit port

Versioned container image reference

Note that llamastack/distribution-starter:latest uses a mutable tag. Consider pinning to a specific version for production deployments to ensure reproducibility.

337-348: Review API key environment variable mapping.

The kai_llm_proxy_api_key_env_map provides mappings for common providers. The fallback logic on line 348 constructs <PROVIDER>_API_KEY by uppercasing the provider name.

Consider edge cases:

Provider names with hyphens (e.g., "azure-openai") would become "AZURE-OPENAI_API_KEY" (invalid env var name)

Multi-word providers may not follow the simple uppercase pattern

For robustness, document the expected provider name format or add validation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b24a47 and efc0ce3.

📒 Files selected for processing (9)

roles/tackle/defaults/main.yml (2 hunks)
roles/tackle/tasks/kai-status.yml (1 hunks)
roles/tackle/tasks/kai.yml (2 hunks)
roles/tackle/templates/deployment-hub.yml.j2 (2 hunks)
roles/tackle/templates/deployment-ui.yml.j2 (1 hunks)
roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (2 hunks)
roles/tackle/templates/kai/llm-proxy-client-configmap.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (1 hunks)
roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

roles/tackle/tasks/kai.yml
roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: run-ci / e2e-api-integration-tests

🔇 Additional comments (14)

roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (2)

6-6: LGTM: Namespace formatting change.

The unquoted namespace is consistent with standard Jinja2 templating practices.

31-45: The review comment's premise is incorrect—these are not duplicate environment variables.

The code shows an intentional mapping, not duplication:

The secret stores only POSTGRESQL_* keys (POSTGRESQL_DATABASE, POSTGRESQL_PASSWORD, POSTGRESQL_USER, POSTGRESQL_HOST)

The deployment surfaces these as POSTGRES_* environment variables, which the official PostgreSQL Docker image expects

This is a necessary conversion between naming conventions. The secret uses POSTGRESQL_ prefixes (common in Keycloak-based setups), while the PostgreSQL image requires POSTGRES_ prefixed environment variables. No action is needed.

Likely an incorrect or invalid review comment.

roles/tackle/templates/deployment-ui.yml.j2 (1)

97-100: LGTM: Clean conditional environment variable.

The KAI_LLM_PROXY_URL environment variable is properly gated by kai_llm_proxy_enabled and follows the established pattern for conditional feature configuration.

roles/tackle/templates/deployment-hub.yml.j2 (2)

132-133: LGTM: Keycloak audience configuration.

The KEYCLOAK_AUDIENCE environment variable is properly placed within the Keycloak configuration block and sourced from the keycloak_api_audience default value.

189-192: LGTM: LLM proxy configuration mapping.

The LLM_PROXY_CONFIG_MAP environment variable is properly gated by kai_llm_proxy_enabled and references the client ConfigMap name, enabling the Hub to serve the LLM proxy configuration to clients.

roles/tackle/tasks/kai-status.yml (3)

23-29: Review complex readiness logic.

The readiness check ensures all replicas are ready and updated. However, the expression is complex and spans multiple lines. Consider the following:

Line 28: The condition checks readyReplicas == replicas, but defaults replicas to 1 if undefined. Verify this default is appropriate—typically deployment replicas should be explicitly defined.

Line 29: Similar default for updatedReplicas comparison.

The compound boolean spanning multiple lines may benefit from intermediate variables for clarity.

The logic appears correct but confirm the default values align with expected deployment behavior.

32-38: LGTM: Blocking and deployment status logic.

The logic correctly determines if Kai is blocked by the proxy (proxy enabled but not ready) and separately tracks Kai API deployment readiness. The compound conditions properly cascade to the status updates.

40-87: LGTM: Comprehensive status condition updates.

The three status conditions (LLMProxyReady, KaiSolutionServerReady, KaiAPIKeysConfigured) are properly structured with:

Correct status, reason, and message fields

Proper timestamp formatting

Appropriate gating (LLMProxyReady only when enabled)

Clear user-facing messages for troubleshooting

The implementation aligns with Kubernetes status conventions.

roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (3)

24-26: LGTM: Config-driven pod restart mechanism.

The checksum annotation ensures pods restart when the ConfigMap changes, which is essential for configuration updates. This is a standard pattern for configuration-driven deployments.

62-68: LGTM: Self-signed certificate handling.

The conditional SSL certificate configuration properly handles self-signed certificates when Keycloak uses HTTPS, setting both SSL_CERT_FILE and REQUESTS_CA_BUNDLE environment variables.

74-82: LGTM: Dynamic API key injection.

The template dynamically injects all keys from the kai-api-keys secret as environment variables, providing flexibility for different LLM providers. This pattern correctly handles the case where no secret exists.
roles/tackle/defaults/main.yml (3)
123-123: LGTM: Keycloak API audience default.

The keycloak_api_audience default value aligns with the PR objectives to integrate JWT validation against Keycloak.

294-303: LGTM: Proxy-specific LLM parameters.

The kai_llm_params_proxy structure properly mirrors kai_llm_params while pointing to the local proxy service. The model format (provider_id/model) and model_provider: "openai_api" indicate OpenAI-compatible API usage through the proxy.

321-335: Verify silent default masking configuration errors on line 335.

The fallback format remote::<provider_name> is correct for llama-stack. However, line 335 defaults kai_llm_provider to "openai" when null, which silently masks missing or invalid configuration:
kai_llm_proxy_provider_id: "{{ kai_llm_provider | default('openai') }}"
Consider raising an error or requiring explicit configuration instead of silently defaulting to "openai".

coderabbitai · 2025-11-20T15:24:17Z

roles/tackle/templates/kai/llm-proxy-client-configmap.yaml.j2

+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: {{ kai_llm_proxy_client_config_name }}
+  namespace: {{ app_namespace }}
+  labels:
+    app.kubernetes.io/name: llm-proxy-client-config
+    app.kubernetes.io/component: kai
+    app.kubernetes.io/part-of: tackle
+data:
+  config.json: |
+    {
+      "model": "{{ kai_llm_proxy_provider_id }}/{{ kai_llm_model }}"
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify the model format matches LLM proxy expectations.

The ConfigMap defines a simple client configuration with the model field formatted as "{{ kai_llm_proxy_provider_id }}/{{ kai_llm_model }}". Ensure this format (provider/model) is compatible with the LLM proxy service and client implementation.

Additionally, confirm that kai_llm_model will always be defined when kai_llm_proxy_enabled is true, as this template doesn't handle the null case.

🏁 Script executed:

#!/bin/bash # Search for LLM proxy client usage patterns and model format expectations echo "Checking for LLM proxy model format references..." rg -nC5 'kai_llm_proxy_provider_id|kai_llm_model|config\.json' echo -e "\nChecking for null handling in related templates..." rg -nC3 'kai_llm_model.*null|kai_llm_model\s*\|'

Length of output: 9643

Looking at the script output, I need to verify when the llm-proxy-client-configmap.yaml.j2 template is rendered and whether it's conditionally protected.

🏁 Script executed:

#!/bin/bash # Find where llm-proxy-client-configmap.yaml.j2 is referenced/used echo "=== Searching for llm-proxy-client-configmap template references ===" rg -nC5 'llm-proxy-client-configmap' echo -e "\n=== Searching for kai_llm_proxy_enabled definition and usage ===" rg -nC3 'kai_llm_proxy_enabled' echo -e "\n=== Checking conditionals in KAI-related role files ===" fd -e yml -e j2 'kai|llm' roles/tackle/templates/kai/ --exec grep -l 'if kai_llm' {} \;

Length of output: 5586

🏁 Script executed:

#!/bin/bash # Also check how this template is registered in the role echo "=== Checking role tasks for template handling ===" fd -e yml 'main\.yml|tasks' roles/tackle/ --exec grep -l 'llm-proxy-client' {} \;

Length of output: 80

🏁 Script executed:

#!/bin/bash # Find where this template is used in role tasks echo "=== Searching for template task that renders llm-proxy-client-configmap ===" rg -n 'llm-proxy-client-configmap' roles/tackle/tasks/ echo -e "\n=== Searching all tasks/main.yml files in tackle role ===" cat roles/tackle/tasks/main.yml | head -100 echo -e "\n=== Looking for kai_llm_proxy_enabled in tasks ===" rg -n 'kai_llm_proxy_enabled' roles/tackle/tasks/

Length of output: 3921

🏁 Script executed:

#!/bin/bash # Search for validation or interdependencies between kai_llm_proxy_enabled and kai_llm_model echo "=== Looking for validation logic or guards ===" rg -nC5 'kai_llm_proxy_enabled.*kai_llm_model|kai_llm_model.*kai_llm_proxy_enabled' echo -e "\n=== Checking if kai_llm_model has other conditional checks ===" rg -nB3 -A3 'kai_llm_model' roles/tackle/tasks/kai.yml echo -e "\n=== Searching in entire kai.yml for related context ===" cat roles/tackle/tasks/kai.yml | grep -A20 'when: kai_llm_proxy_enabled'

Length of output: 796

Add null-check for kai_llm_model to prevent invalid JSON when proxy is enabled without a model specified.

The ConfigMap template uses kai_llm_model without null protection, unlike the related llm-proxy-configmap.yaml.j2 template which wraps the model reference in a {% if kai_llm_model %} check. If kai_llm_proxy_enabled is true but kai_llm_model remains null (from defaults), the rendered JSON becomes invalid: "model": "openai/null".

Add the same null-checking pattern from the sibling template to ensure the ConfigMap is only created when both conditions are met.

coderabbitai · 2025-11-20T15:24:17Z

roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2

+          startupProbe:
+            httpGet:
+              path: /v1/health
+              port: 8321
+            failureThreshold: 30
+            periodSeconds: 10
+            initialDelaySeconds: 10
+          livenessProbe:
+            httpGet:
+              path: /v1/health
+              port: 8321
+            periodSeconds: 30
+            timeoutSeconds: 5
+            failureThreshold: 3
+          readinessProbe:
+            httpGet:
+              path: /v1/health
+              port: 8321
+            periodSeconds: 10
+            timeoutSeconds: 5
+            successThreshold: 1
+            failureThreshold: 3


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify health endpoint and probe timings.

The deployment uses /v1/health for all probes with the following configuration:

Startup: 30 failures × 10s = 5 minutes max startup time

Liveness: 30s period, 3 failures before restart

Readiness: 10s period

Confirm that:

The /v1/health endpoint exists in the llamastack/distribution-starter:latest image

The 5-minute startup window is sufficient for model initialization

The liveness failure threshold (90 seconds) allows for temporary slow responses without unnecessary restarts

🌐 Web query:

What health check endpoints does llama-stack distribution-starter provide?

💡 Result:

The starter distribution exposes HTTP health endpoints under /v1, specifically:

GET /v1/health (generic health). [1]

GET /v1/health/ready (readiness). [2]

GET /v1/health/live (liveness). [2]

Sources: starter/bring-your-own docs showing /v1/health and the API reference showing /v1/health/ready and /v1/health/live. [1][2]

Use dedicated health check endpoints for readiness and liveness probes.

The /v1/health endpoint exists, but llama-stack distribution-starter also provides dedicated /v1/health/ready and /v1/health/live endpoints. Update the configuration to use:

readinessProbe → /v1/health/ready

livenessProbe → /v1/health/live

startupProbe → /v1/health (or /v1/health/live)

Also verify through deployment testing that:

The 5-minute startup window is sufficient for model initialization

The 90-second liveness failure threshold (30s period × 3 failures) allows for temporary slow responses

🤖 Prompt for AI Agents

In roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 around lines 35 to 56, the probes all point to /v1/health; change them so readinessProbe uses /v1/health/ready, livenessProbe uses /v1/health/live, and keep startupProbe pointed at /v1/health (or /v1/health/live if preferred), then ensure the existing timing values reflect desired behavior (confirm startupProbe initialDelaySeconds and failureThreshold provide ~5 minutes for model init, and that liveness periodSeconds × failureThreshold (~90s) tolerates transient slowness) and run deployment tests to validate the timings and endpoints return expected HTTP success codes.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

jortel · 2025-12-01T16:45:02Z

roles/tackle/templates/deployment-hub.yml.j2

            - name: KAI_URL
              value: "{{ kai_url }}"
+{% if kai_llm_proxy_enabled %}
+            - name: LLM_PROXY_CONFIG_MAP


This isn't necessary. The hub endpoint serves the configmap by hame.

yep missed that on cleanup

jortel · 2025-12-01T16:46:43Z

roles/tackle/templates/deployment-ui.yml.j2

            - name: TACKLE_HUB_URL
              value: "{{ hub_url }}"
+{% if kai_llm_proxy_enabled|bool %}
+            - name: KAI_LLM_PROXY_URL 


I thought we decided the new llm proxy service would be mapped in the main rev proxy instead of the brokering through the hub.

Is this not setting it in the main reverse proxy? I set this envvar to be used here: https://github.com/konveyor/tackle2-ui/pull/2743/files

Yes. My mistake. Thought it was the hub deployment.

Signed-off-by: Fabian von Feilitzsch <[email protected]>

jortel

LGTM
Defer to @jmontleon regarding general ansible-operator things.

✨ Add llm-proxy deployment

7b24a47

Signed-off-by: Fabian von Feilitzsch <[email protected]>

fabianvf marked this pull request as draft November 14, 2025 17:06

fabianvf requested a review from jmontleon November 14, 2025 17:08

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Rely on reverse proxy to handle ingress

efc0ce3

Signed-off-by: Fabian von Feilitzsch <[email protected]>

fabianvf marked this pull request as ready for review November 20, 2025 15:19

jortel reviewed Nov 20, 2025

View reviewed changes

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

fabianvf added 2 commits November 20, 2025 10:50

Remove -config suffix from configmaps

f7e507b

Signed-off-by: Fabian von Feilitzsch <[email protected]>

Add a label showing the configmap should be exposed

60119dd

Signed-off-by: Fabian von Feilitzsch <[email protected]>

jortel reviewed Dec 1, 2025

View reviewed changes

Remove unnecessary envvar from hub deployment

2fa9e78

Signed-off-by: Fabian von Feilitzsch <[email protected]>

jortel approved these changes Dec 1, 2025

View reviewed changes

fabianvf merged commit 16dc436 into konveyor:main Dec 1, 2025
13 checks passed

This was referenced Dec 8, 2025

🐛 Fix condition guarding for kai tasks, handling of trusted_ca_bundle #505

Merged

🐛 Allow either internal or external URL in issuer for llm-proxy #506

Open

✨ Add llm-proxy deployment #496

✨ Add llm-proxy deployment #496

Uh oh!

Conversation

fabianvf commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

jortel Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianvf Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jortel Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

fabianvf Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

jortel Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

fabianvf Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

jortel Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

jortel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fabianvf commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 14, 2025 •

edited

Loading

jortel Nov 20, 2025 •

edited

Loading