Skip to content

Conversation

@fabianvf
Copy link
Contributor

@fabianvf fabianvf commented Nov 14, 2025

Closes #487
Depends on konveyor/tackle2-hub#953

Still TODO: need to figure out the proper way to set the ingress host etc, and create a provider-settings configmap that the hub can serve to users.

Summary by CodeRabbit

  • New Features

    • Optional LLM proxy service with multi-provider support, deployable proxy, client config, and UI integration when enabled.
  • Monitoring

    • Component status updates for LLM proxy readiness, API server readiness, and API-keys configured conditions.
  • Configuration

    • Keycloak API audience added; UI exposed proxy URL when enabled; secret-driven API key envs rendered conditionally; additional DB env vars surfaced.
  • Chores

    • Added a debug log for the Kai solution server enablement flag.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Fabian von Feilitzsch <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

Adds an LLM proxy feature: new defaults and provider mappings, Ansible tasks to deploy the proxy and update Kai CR status, and multiple Kubernetes templates (ConfigMaps, Deployment, Service, client config) with Keycloak and multi-provider LLM wiring.

Changes

Cohort / File(s) Summary
Configuration defaults
roles/tackle/defaults/main.yml
Adds keycloak_api_audience and LLM-proxy defaults and mappings (kai_llm_params_proxy, kai_llm_proxy_enabled, kai_llm_proxy_image_fqin, kai_llm_proxy_url, kai_llm_proxy_client_config_name, kai_llm_proxy_provider_type_map, kai_llm_proxy_provider_type, kai_llm_proxy_provider_id, kai_llm_proxy_api_key_env_map, kai_llm_proxy_api_key_env) and provider selection logic.
Ansible tasks
roles/tackle/tasks/main.yml, roles/tackle/tasks/kai.yml, roles/tackle/tasks/kai-status.yml
Adds a debug task for kai_solution_server_enabled; conditional "Deploy LLM Proxy" block (ConfigMap, Deployment, Service, client ConfigMap) gated by kai_llm_proxy_enabled; kai-status.yml queries llm-proxy and kai-api Deployments, computes readiness flags and updates CR conditions via operator_sdk.util.k8s_status.
Hub & UI deployment templates
roles/tackle/templates/deployment-hub.yml.j2, roles/tackle/templates/deployment-ui.yml.j2
Adds KEYCLOAK_AUDIENCE env in hub template and conditional KAI_LLM_PROXY_URL env var in UI template when proxy enabled.
Kai component templates
roles/tackle/templates/kai/kai-api-deployment.yaml.j2, roles/tackle/templates/kai/kai-db-deployment.yaml.j2
kai-api template: make secret-derived env vars conditional on secret data presence. kai-db template: unquoted namespace and add POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB env vars sourced from secrets.
LLM proxy resources (new templates)
roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2, roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2, roles/tackle/templates/kai/llm-proxy-service.yaml.j2, roles/tackle/templates/kai/llm-proxy-client-configmap.yaml.j2
New ConfigMap (run.yaml with multi-provider configs, storage, auth, telemetry), Deployment (probes, envs, secret injections, ConfigMap checksum restart, optional TLS CA mounts), Service (ClusterIP:8321), and client ConfigMap (config.json with model). Provider-specific branches and Keycloak OAuth wiring included.

Sequence Diagram(s)

sequenceDiagram
    participant Ansible as Ansible Role
    participant K8s as Kubernetes API
    participant Operator as Operator (CR updates)

    Ansible->>K8s: create ConfigMap (llm-proxy), Deployment, Service, client ConfigMap (when enabled)
    opt ansible_operator_meta defined
        Ansible->>Operator: include kai-status.yml
        Operator->>K8s: get Deployment llm-proxy
        Operator->>K8s: get Deployment kai-api
        Operator->>Operator: compute readiness flags
        Operator->>K8s: update CR conditions (LLMProxyReady, KaiSolutionServerReady, KaiAPIKeysConfigured)
    end
Loading
sequenceDiagram
    participant Client as Client
    participant Proxy as LLM Proxy
    participant Keycloak as Keycloak
    participant Provider as LLM Provider
    participant DB as PostgreSQL

    Client->>Proxy: request + JWT
    Proxy->>Keycloak: validate token
    alt token valid
        Keycloak-->>Proxy: validated
        Proxy->>Provider: forward request (provider-specific credentials)
        Provider-->>Proxy: LLM response
        Proxy->>DB: store/track request
        Proxy-->>Client: return response
    else token invalid
        Keycloak-->>Proxy: invalid
        Proxy-->>Client: 401/deny
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (large, complex templating, many provider branches, Keycloak wiring).
    • roles/tackle/tasks/kai-status.yml (readiness logic, timestamp formatting, condition updates).
    • Defaults and provider-mapping logic in roles/tackle/defaults/main.yml.
    • Cross-template env var/secret name consistency (kai_api_key_secret_status usage across templates).
    • Deployment checksum annotation and volume/secret mount conditionals in llm-proxy-deployment template.

Suggested reviewers

  • jwmatthews

Poem

🐰 I stitched a config, small and spry,

A proxy hops where models lie,
Tokens checked and routes aligned,
Secrets snug, and services signed,
Hooray — Kai's new helper bounds nearby.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding LLM proxy deployment functionality. It is directly related to the changeset.
Linked Issues check ✅ Passed Changes implement all core requirements: externally reachable proxy via Service/Deployment, LLM credential configuration via defaults/templates, and JWT/Keycloak validation setup in llm-proxy-configmap.yaml.j2.
Out of Scope Changes check ✅ Passed All changes are scoped to LLM proxy deployment and supporting infrastructure. Minor additions like keycloak_api_audience and kai-status.yml monitoring are directly supporting the proxy deployment.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60119dd and 2fa9e78.

📒 Files selected for processing (1)
  • roles/tackle/templates/deployment-hub.yml.j2 (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • roles/tackle/templates/deployment-hub.yml.j2
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-operator-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fabianvf fabianvf marked this pull request as draft November 14, 2025 17:06
@fabianvf fabianvf requested a review from jmontleon November 14, 2025 17:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
roles/tackle/tasks/main.yml (1)

914-917: Debug logging is safe; consider gating if it becomes noisy

This debug task mirrors the existing when: kai_solution_server_enabled | bool usage and doesn’t introduce new failure modes, but it will log on every run. If logs get too chatty over time, consider wrapping it in a when: ansible_verbosity|int >= 2 (or similar) so it only appears in more verbose runs.

roles/tackle/templates/kai/kai-api-deployment.yaml.j2 (1)

29-37: Good defensive guard around secret-derived env vars

The if around kai_api_key_secret_status.resources | length > 0 correctly prevents indexing resources.0 when the secret is missing, avoiding template failures while keeping behavior unchanged when it exists. If you want to tighten it further, you could also guard resources with a | default([]) to handle the (unlikely) case where kai_api_key_secret_status is unset, but that’s optional.

roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (1)

31-45: Adding POSTGRES_ aliases is reasonable; confirm image expectations*

Mirroring the existing POSTGRESQL_* vars with POSTGRES_* aliases wired to the same secret keys is a sensible way to support images/tooling expecting either naming scheme. Just confirm that the kai_database_image_fqin image actually consumes the desired variant(s) so you’re not carrying unused env vars.

roles/tackle/templates/kai/llm-proxy-ingress.yaml.j2 (1)

1-45: Ingress wiring is sound; consider refining TLS placeholder

The ingress correctly targets llm-proxy:8321, parametrizes ingressClassName, and sets reasonable nginx/alb annotations. The tls: - {} placeholder, though valid YAML, may behave inconsistently across controllers; you might prefer to either omit tls entirely when no secret/hosts are configured or model it after whatever pattern you use in the existing UI ingress template.

roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1)

62-68: Normalize feature_auth_required to a boolean in Jinja conditionals

Here you use:

{% if feature_auth_required and keycloak_sso_proto == 'https' %}
  ...
{% endif %}

and similarly for the trusted-ca volume. Elsewhere in the role, feature_auth_required is typically used as feature_auth_required | bool because it’s derived via a Jinja expression and may be a string value.

For consistency (and to avoid surprises where a string "false" still evaluates truthy), consider:

-{% if feature_auth_required and keycloak_sso_proto == 'https' %}
+{% if feature_auth_required | bool and keycloak_sso_proto == 'https' %}
...
-{% if feature_auth_required and keycloak_sso_proto == 'https' %}
+{% if feature_auth_required | bool and keycloak_sso_proto == 'https' %}

This keeps the behavior aligned with the rest of the templates and ensures the trusted CA wiring only kicks in when auth is actually required.

Also applies to: 89-115

roles/tackle/defaults/main.yml (1)

123-123: LLM proxy defaults are coherent; consider guarding kai_llm_model in computed model id

The new defaults for keycloak_api_audience, kai_llm_params_proxy, and the llm_proxy_* provider/API-key mappings line up well with the deployment and ConfigMap templates.

One small edge case: with the current definition

kai_llm_params_proxy:
  model: "{{ llm_proxy_provider_id }}/{{ kai_llm_model }}"

and kai_llm_model defaulting to null, you’ll end up with values like openai/None if someone enables the proxy without explicitly setting a model. You might want a safer default or explicit guard, for example:

-kai_llm_params_proxy:
-  model: "{{ llm_proxy_provider_id }}/{{ kai_llm_model }}"
+kai_llm_params_proxy:
+  model: >-
+    {{ (llm_proxy_provider_id ~ '/' ~ kai_llm_model)
+       if kai_llm_model else None }}

or simply document that kai_llm_model is required whenever kai_llm_proxy_enabled is set.

Also applies to: 294-351

roles/tackle/tasks/kai-status.yml (1)

22-39: Downgrade severity: This is optional refactoring, not a critical issue

The code currently works correctly due to Jinja2's reliable short-circuit evaluation of the and operator. When kai_llm_proxy_enabled is false (the default), the first part of the AND expression fails and llm_proxy_deployment.resources is never evaluated—preventing any undefined variable error.

However, the review's suggestion is valid as a defensive programming practice to make the guard explicit rather than relying on implicit short-circuit behavior. This improves code clarity and maintainability.

The suggested fix is sound, but this is a code quality improvement recommendation, not a critical bug:

- name: Set LLM proxy ready status
  when: kai_llm_proxy_enabled | bool and (llm_proxy_deployment is defined)
  set_fact:
    llm_proxy_ready: "{{ llm_proxy_deployment.resources | length > 0 and
                         llm_proxy_deployment.resources[0].status.readyReplicas | default(0) > 0 and
                         llm_proxy_deployment.resources[0].status.readyReplicas | default(0) == llm_proxy_deployment.resources[0].status.replicas | default(1) and
                         llm_proxy_deployment.resources[0].status.updatedReplicas | default(0) == llm_proxy_deployment.resources[0].status.replicas | default(1) }}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75da850 and 7b24a47.

📒 Files selected for processing (11)
  • roles/tackle/defaults/main.yml (2 hunks)
  • roles/tackle/tasks/kai-status.yml (1 hunks)
  • roles/tackle/tasks/kai.yml (2 hunks)
  • roles/tackle/tasks/main.yml (1 hunks)
  • roles/tackle/templates/deployment-hub.yml.j2 (1 hunks)
  • roles/tackle/templates/kai/kai-api-deployment.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-ingress.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-service.yaml.j2 (1 hunks)
🔇 Additional comments (3)
roles/tackle/templates/kai/llm-proxy-service.yaml.j2 (1)

1-19: Service wiring looks correct; ensure pod labels match selector

ClusterIP service on port 8321 with selector app: llm-proxy aligns with the ingress/backend usage. Just double-check the llm-proxy Deployment labels its pods with app: llm-proxy so this selector actually picks them up.

roles/tackle/templates/deployment-hub.yml.j2 (1)

101-134: KEYCLOAK_AUDIENCE wiring looks consistent; ensure it matches Keycloak client config

Adding KEYCLOAK_AUDIENCE sourced from keycloak_api_audience under the existing Keycloak-only block is consistent with how the same variable is used in the LLM proxy config. Just make sure the value ("konveyor-api" by default) actually matches the audience/client settings in Keycloak so JWT validation is coherent across Hub, Kai, and the proxy.

roles/tackle/tasks/kai.yml (1)

149-152: Status update hook for Kai components looks correct

Importing kai-status.yml only when ansible_operator_meta is defined is a clean way to keep CR status updates operator-only while reusing the same Kai deployment tasks elsewhere.

Comment on lines 71 to 139
- name: Get Keycloak external URL
when:
- kai_llm_proxy_enabled | bool
- feature_auth_required | bool
- keycloak_external_url is not defined
block:
- name: "Get Keycloak route on OpenShift"
when: openshift_cluster
k8s_info:
api_version: route.openshift.io/v1
kind: Route
name: "{{ keycloak_sso_name }}"
namespace: "{{ app_namespace }}"
register: keycloak_route

- name: "Set Keycloak external URL from route"
when:
- openshift_cluster
- keycloak_route.resources | length > 0
set_fact:
keycloak_external_url: "https://{{ keycloak_route.resources[0].spec.host }}"

- name: "Get Keycloak ingress on Kubernetes"
when: not openshift_cluster
k8s_info:
api_version: networking.k8s.io/v1
kind: Ingress
name: "{{ keycloak_sso_name }}"
namespace: "{{ app_namespace }}"
register: keycloak_ingress

- name: "Set Keycloak external URL from ingress"
when:
- not openshift_cluster
- keycloak_ingress.resources | length > 0
set_fact:
keycloak_external_url: "http://{{ keycloak_ingress.resources[0].spec.rules[0].host }}/auth"

- name: Deploy LLM Proxy
when: kai_llm_proxy_enabled | bool
block:
- name: Create LLM Proxy ConfigMap
k8s:
state: present
template: kai/llm-proxy-configmap.yaml.j2

- name: Create LLM Proxy Deployment
k8s:
state: present
template: kai/llm-proxy-deployment.yaml.j2

- name: Create LLM Proxy Service
k8s:
state: present
template: kai/llm-proxy-service.yaml.j2

- name: Create LLM Proxy Ingress
k8s:
state: present
template: kai/llm-proxy-ingress.yaml.j2
when: not openshift_cluster

# TODO: Determine LLM proxy external URL
# After creating the ingress/route, we should:
# 1. Query the ingress/route to get the actual external URL
# 2. Store it as llm_proxy_external_url
# 3. Make it available to users (perhaps in status or ConfigMap)
# This URL is needed for external clients to connect to the proxy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Keycloak external URL discovery is good, but currently unused and path/scheme may not match real Keycloak

The route/ingress discovery logic is a solid start, but a couple of points to tighten:

  • keycloak_external_url isn’t used in the provided llm-proxy-configmap.yaml.j2; the proxy auth section still points at the internal service. If the JWT iss claim is based on the external URL, you’ll likely want to plumb keycloak_external_url into the LLM proxy’s issuer / jwks.uri instead of (or in addition to) the cluster-internal host.
  • The Kubernetes Ingress branch hardcodes http://{{ keycloak_ingress.resources[0].spec.rules[0].host }}/auth. That may not match clusters where:
    • TLS is terminated at the ingress (should be https://), or
    • the Keycloak path isn’t /auth (or is already included in the ingress path).

You might want to:

  • Normalize keycloak_external_url to a single base (scheme + host [+ optional base path]) and
  • Derive the issuer/JWKS URLs from that in the proxy config, with a fallback to the internal service if discovery fails.

🏁 Script executed:

#!/bin/bash
# Find the LLM proxy configuration template
fd -t f "llm-proxy" --type file | head -20

Length of output: 216


🏁 Script executed:

#!/bin/bash
# Search for references to keycloak_external_url
rg -n "keycloak_external_url" --type yaml --type j2

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Find and examine the llm-proxy-configmap template
find . -name "llm-proxy-configmap.yaml.j2" -o -name "*llm*proxy*" -type f 2>/dev/null | head -20

Length of output: 224


🏁 Script executed:

#!/bin/bash
# Search for keycloak references in proxy or auth config
rg -n -C2 "keycloak|issuer|jwks" --type yaml --type j2 | grep -A2 -B2 "proxy\|auth\|config" | head -50

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look at the kai.yml file around lines 71-139 for context
cat roles/tackle/tasks/kai.yml | sed -n '60,150p'

Length of output: 2917


🏁 Script executed:

#!/bin/bash
# Examine the LLM proxy configmap template
cat roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

Length of output: 3765


🏁 Script executed:

#!/bin/bash
# Search for keycloak_external_url in all files
rg -n "keycloak_external_url" --no-type

Length of output: 435


🏁 Script executed:

#!/bin/bash
# Search for keycloak references in templates (no type filter)
rg -n "keycloak" roles/tackle/templates/kai/ -A2 -B2

Length of output: 2310


🏁 Script executed:

#!/bin/bash
# Search for issuer and jwks patterns
rg -n "issuer|jwks|iss_uri|token_endpoint" roles/tackle/templates/kai/

Length of output: 277


🏁 Script executed:

#!/bin/bash
# Look at all template files to understand structure
ls -la roles/tackle/templates/kai/ | grep -E "llm-proxy|kai-api"

Length of output: 453


🏁 Script executed:

#!/bin/bash
# Search entire codebase for any reference to keycloak_external_url
rg "keycloak_external_url"

Length of output: 320


🏁 Script executed:

#!/bin/bash
# Check if keycloak_sso_service_name and keycloak_sso_port are defined
rg -n "keycloak_sso_service_name|keycloak_sso_port" roles/tackle/defaults/ roles/tackle/vars/ 2>/dev/null | head -20

Length of output: 916


🏁 Script executed:

#!/bin/bash
# Verify what keycloak_sso_service_name typically resolves to
rg -n "keycloak_sso_service_name.*=" roles/tackle/

Length of output: 0


Keycloak external URL is discovered but unused—JWT validation will fail

The review comment is accurate. Verification confirms:

  • keycloak_external_url is computed in roles/tackle/tasks/kai.yml (lines ~94, ~104) but has zero references anywhere else in the codebase—it is never passed to any template.
  • The LLM proxy auth config (roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2, lines 113–114) hardcodes internal Kubernetes service URLs (keycloak_sso_service_name) instead of using the external URL.
  • If Keycloak issues JWTs with an external issuer (standard for ingress/route deployments), the proxy will reject them because the issuer claim won't match the internal service URL configured in the proxy.
  • The hardcoded http:// scheme (lines 104 and 113) may be incorrect if TLS terminates at the ingress.
  • The hardcoded /auth path (lines 104 and 113) assumes a fixed Keycloak path structure that may not hold across deployments.

Recommendation: Plumb keycloak_external_url into the proxy config template as the issuer/JWKS base, and derive the full URIs from it with a fallback to the internal service if discovery fails.

🤖 Prompt for AI Agents
In roles/tackle/tasks/kai.yml lines 71-139, keycloak_external_url is computed
but never used; update the task flow to pass keycloak_external_url into the LLM
Proxy ConfigMap template (e.g., add vars to the k8s template call or set a fact
consumed by the template) so the proxy can use it as the issuer/JWKS base;
modify the value you pass to derive full URIs from keycloak_external_url
(respecting its scheme/host/path) and only fall back to the internal service
name (keycloak_sso_service_name and port) if discovery is missing, and avoid
hardcoding "http://" or the "/auth" path in the task—let the template construct
issuer/jwks endpoints from the provided external URL with a safe fallback to
internal service when needed.

Comment on lines +106 to +118
server:
port: 8321
{% if feature_auth_required %}
auth:
provider_config:
type: "oauth2_token"
jwks:
uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs"
issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"
audience: "{{ keycloak_api_audience }}"
{% endif %}
telemetry:
enabled: false No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Align OAuth2 issuer/jwks.uri with Keycloak scheme and external URL

The auth config is currently:

      auth:
        provider_config:
          type: "oauth2_token"
          jwks:
            uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs"
          issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"
          audience: "{{ keycloak_api_audience }}"

Issues:

  • The scheme is hardcoded to http:// even though keycloak_sso_port is 8443 when TLS is enabled. That will fail against a TLS-only Keycloak.
  • Both issuer and jwks.uri point at the cluster-internal service name, while Keycloak typically mints tokens with iss pointing at the external URL (route/ingress, including KEYCLOAK_FRONTEND_URL). This mismatch can cause issuer validation to fail in the LLM proxy.

At minimum, it’s safer to respect keycloak_sso_proto:

-          jwks:
-            uri: "http://{{ keycloak_sso_service_name }}:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs"
-          issuer: "http://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"
+          jwks:
+            uri: "{{ keycloak_sso_proto }}://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}/protocol/openid-connect/certs"
+          issuer: "{{ keycloak_sso_proto }}://{{ keycloak_sso_service_name }}.{{ app_namespace }}.svc:{{ keycloak_sso_port }}/auth/realms/{{ keycloak_sso_realm }}"

Given you already compute keycloak_external_url in roles/tackle/tasks/kai.yml when the LLM proxy + auth are enabled, you may want to go further and base issuer/jwks.uri on that external URL (with a fallback to the internal service) so they match the iss claim seen by external clients.

I’d recommend validating this end‑to‑end with a real JWT from your Keycloak to ensure the proxy’s issuer/audience checks pass as intended.


I'll verify the review comment by examining the codebase for the relevant configuration variables and patterns.


🏁 Script executed:

#!/bin/bash
# 1. Display the template file to verify current state
echo "=== Template content ===" 
cat roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2

echo -e "\n=== Searching for keycloak_sso_proto definition ==="
rg -n "keycloak_sso_proto" --type yaml --type yml -C 3

echo -e "\n=== Searching for keycloak_external_url ==="
rg -n "keycloak_external_url" -C 3

echo -e "\n=== Searching for keycloak_sso_port definition ==="
rg -n "keycloak_sso_port" --type yaml --type yml -C 2

Length of output: 5442


I need to search without the file type filter that caused the error:


🏁 Script executed:

#!/bin/bash
# Search for keycloak_sso_proto definition
echo "=== Searching for keycloak_sso_proto ==="
rg -n "keycloak_sso_proto" -C 3

echo -e "\n=== Searching for keycloak_sso_port ==="
rg -n "keycloak_sso_port" -C 3

echo -e "\n=== Looking for variable defaults in defaults/main.yml ==="
find . -name "defaults" -o -name "main.yml" | head -20
rg -n "keycloak" roles/tackle/defaults/main.yml 2>/dev/null | head -30

echo -e "\n=== Checking if keycloak_sso_proto is used elsewhere ==="
rg -n "keycloak_sso_proto\|keycloak_sso_port" --type py --type yml --type yaml --type j2 | head -40

Length of output: 9740


Use keycloak_sso_proto for OAuth2 auth URLs to fix TLS mismatch

The review identifies a confirmed bug in roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 lines 113–114. The auth config hardcodes the scheme to http:// even though:

  • keycloak_sso_proto is defined as 'https' when TLS is enabled and 'http' when disabled
  • keycloak_sso_port is 8443 when TLS is enabled

This causes OAuth2 issuer validation to fail in HTTPS/TLS deployments. Additionally, keycloak_external_url is already computed in the role when LLM proxy + auth are enabled, so using it for the issuer and jwks.uri would align token validation with external client expectations.

At minimum, update lines 113–114 to use {{ keycloak_sso_proto }} instead of hardcoded http://. Ideally, base both URLs on the computed keycloak_external_url to match Keycloak's token issuer claim.

🤖 Prompt for AI Agents
roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 lines 106 to 118: the
OAuth2 URLs are hardcoded with "http://" which breaks issuer/jwks validation in
TLS deployments; replace the hardcoded scheme with the proper templated value
and preferably use the precomputed keycloak_external_url: update jwks.uri and
issuer to build URLs using {{ keycloak_sso_proto }} and {{
keycloak_sso_service_name }}, {{ keycloak_sso_port }} and {{ keycloak_sso_realm
}} or, better, use {{ keycloak_external_url }} (ensuring it includes proto, host
and port) so both jwks.uri and issuer match Keycloak's external issuer value.

Signed-off-by: Fabian von Feilitzsch <[email protected]>
@fabianvf fabianvf marked this pull request as ready for review November 20, 2025 15:19
apiVersion: v1
kind: ConfigMap
metadata:
name: llm-proxy-config
Copy link
Contributor

@jortel jortel Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-config seems redundant on a configMap. Like having all the pod resources have a -pod suffix in the name. just a nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1)

94-100: Review resource limits for production workloads.

The current resource configuration:

  • Requests: 256Mi RAM, 100m CPU
  • Limits: 512Mi RAM, 500m CPU

These limits may be insufficient for LLM proxy operations under load. Consider:

  1. Whether 512Mi RAM is adequate for the proxy's memory footprint
  2. If 500m CPU (0.5 cores) provides acceptable response times
  3. Whether these should be configurable via variables (similar to other components in the codebase)

Consider making resources configurable:

In roles/tackle/defaults/main.yml, add:

kai_llm_proxy_container_requests_cpu: "100m"
kai_llm_proxy_container_requests_memory: "256Mi"
kai_llm_proxy_container_limits_cpu: "500m"
kai_llm_proxy_container_limits_memory: "512Mi"

Then update the template:

           resources:
             requests:
-              memory: "256Mi"
-              cpu: "100m"
+              memory: "{{ kai_llm_proxy_container_requests_memory }}"
+              cpu: "{{ kai_llm_proxy_container_requests_cpu }}"
             limits:
-              memory: "512Mi"
-              cpu: "500m"
+              memory: "{{ kai_llm_proxy_container_limits_memory }}"
+              cpu: "{{ kai_llm_proxy_container_limits_cpu }}"
roles/tackle/defaults/main.yml (2)

312-319: LGTM: LLM proxy feature flags and service configuration.

The defaults provide sensible values:

  • kai_llm_proxy_enabled: false (opt-in)
  • Internal service URL with explicit port
  • Versioned container image reference

Note that llamastack/distribution-starter:latest uses a mutable tag. Consider pinning to a specific version for production deployments to ensure reproducibility.


337-348: Review API key environment variable mapping.

The kai_llm_proxy_api_key_env_map provides mappings for common providers. The fallback logic on line 348 constructs <PROVIDER>_API_KEY by uppercasing the provider name.

Consider edge cases:

  1. Provider names with hyphens (e.g., "azure-openai") would become "AZURE-OPENAI_API_KEY" (invalid env var name)
  2. Multi-word providers may not follow the simple uppercase pattern

For robustness, document the expected provider name format or add validation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b24a47 and efc0ce3.

📒 Files selected for processing (9)
  • roles/tackle/defaults/main.yml (2 hunks)
  • roles/tackle/tasks/kai-status.yml (1 hunks)
  • roles/tackle/tasks/kai.yml (2 hunks)
  • roles/tackle/templates/deployment-hub.yml.j2 (2 hunks)
  • roles/tackle/templates/deployment-ui.yml.j2 (1 hunks)
  • roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (2 hunks)
  • roles/tackle/templates/kai/llm-proxy-client-configmap.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2 (1 hunks)
  • roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • roles/tackle/tasks/kai.yml
  • roles/tackle/templates/kai/llm-proxy-configmap.yaml.j2
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-ci / e2e-api-integration-tests
🔇 Additional comments (14)
roles/tackle/templates/kai/kai-db-deployment.yaml.j2 (2)

6-6: LGTM: Namespace formatting change.

The unquoted namespace is consistent with standard Jinja2 templating practices.


31-45: The review comment's premise is incorrect—these are not duplicate environment variables.

The code shows an intentional mapping, not duplication:

  • The secret stores only POSTGRESQL_* keys (POSTGRESQL_DATABASE, POSTGRESQL_PASSWORD, POSTGRESQL_USER, POSTGRESQL_HOST)
  • The deployment surfaces these as POSTGRES_* environment variables, which the official PostgreSQL Docker image expects

This is a necessary conversion between naming conventions. The secret uses POSTGRESQL_ prefixes (common in Keycloak-based setups), while the PostgreSQL image requires POSTGRES_ prefixed environment variables. No action is needed.

Likely an incorrect or invalid review comment.

roles/tackle/templates/deployment-ui.yml.j2 (1)

97-100: LGTM: Clean conditional environment variable.

The KAI_LLM_PROXY_URL environment variable is properly gated by kai_llm_proxy_enabled and follows the established pattern for conditional feature configuration.

roles/tackle/templates/deployment-hub.yml.j2 (2)

132-133: LGTM: Keycloak audience configuration.

The KEYCLOAK_AUDIENCE environment variable is properly placed within the Keycloak configuration block and sourced from the keycloak_api_audience default value.


189-192: LGTM: LLM proxy configuration mapping.

The LLM_PROXY_CONFIG_MAP environment variable is properly gated by kai_llm_proxy_enabled and references the client ConfigMap name, enabling the Hub to serve the LLM proxy configuration to clients.

roles/tackle/tasks/kai-status.yml (3)

23-29: Review complex readiness logic.

The readiness check ensures all replicas are ready and updated. However, the expression is complex and spans multiple lines. Consider the following:

  1. Line 28: The condition checks readyReplicas == replicas, but defaults replicas to 1 if undefined. Verify this default is appropriate—typically deployment replicas should be explicitly defined.
  2. Line 29: Similar default for updatedReplicas comparison.
  3. The compound boolean spanning multiple lines may benefit from intermediate variables for clarity.

The logic appears correct but confirm the default values align with expected deployment behavior.


32-38: LGTM: Blocking and deployment status logic.

The logic correctly determines if Kai is blocked by the proxy (proxy enabled but not ready) and separately tracks Kai API deployment readiness. The compound conditions properly cascade to the status updates.


40-87: LGTM: Comprehensive status condition updates.

The three status conditions (LLMProxyReady, KaiSolutionServerReady, KaiAPIKeysConfigured) are properly structured with:

  • Correct status, reason, and message fields
  • Proper timestamp formatting
  • Appropriate gating (LLMProxyReady only when enabled)
  • Clear user-facing messages for troubleshooting

The implementation aligns with Kubernetes status conventions.

roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 (3)

24-26: LGTM: Config-driven pod restart mechanism.

The checksum annotation ensures pods restart when the ConfigMap changes, which is essential for configuration updates. This is a standard pattern for configuration-driven deployments.


62-68: LGTM: Self-signed certificate handling.

The conditional SSL certificate configuration properly handles self-signed certificates when Keycloak uses HTTPS, setting both SSL_CERT_FILE and REQUESTS_CA_BUNDLE environment variables.


74-82: LGTM: Dynamic API key injection.

The template dynamically injects all keys from the kai-api-keys secret as environment variables, providing flexibility for different LLM providers. This pattern correctly handles the case where no secret exists.

roles/tackle/defaults/main.yml (3)

123-123: LGTM: Keycloak API audience default.

The keycloak_api_audience default value aligns with the PR objectives to integrate JWT validation against Keycloak.


294-303: LGTM: Proxy-specific LLM parameters.

The kai_llm_params_proxy structure properly mirrors kai_llm_params while pointing to the local proxy service. The model format (provider_id/model) and model_provider: "openai_api" indicate OpenAI-compatible API usage through the proxy.


321-335: Verify silent default masking configuration errors on line 335.

The fallback format remote::<provider_name> is correct for llama-stack. However, line 335 defaults kai_llm_provider to "openai" when null, which silently masks missing or invalid configuration:

kai_llm_proxy_provider_id: "{{ kai_llm_provider | default('openai') }}"

Consider raising an error or requiring explicit configuration instead of silently defaulting to "openai".

Comment on lines 1 to 15
---
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ kai_llm_proxy_client_config_name }}
namespace: {{ app_namespace }}
labels:
app.kubernetes.io/name: llm-proxy-client-config
app.kubernetes.io/component: kai
app.kubernetes.io/part-of: tackle
data:
config.json: |
{
"model": "{{ kai_llm_proxy_provider_id }}/{{ kai_llm_model }}"
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify the model format matches LLM proxy expectations.

The ConfigMap defines a simple client configuration with the model field formatted as "{{ kai_llm_proxy_provider_id }}/{{ kai_llm_model }}". Ensure this format (provider/model) is compatible with the LLM proxy service and client implementation.

Additionally, confirm that kai_llm_model will always be defined when kai_llm_proxy_enabled is true, as this template doesn't handle the null case.


🏁 Script executed:

#!/bin/bash
# Search for LLM proxy client usage patterns and model format expectations
echo "Checking for LLM proxy model format references..."
rg -nC5 'kai_llm_proxy_provider_id|kai_llm_model|config\.json' 

echo -e "\nChecking for null handling in related templates..."
rg -nC3 'kai_llm_model.*null|kai_llm_model\s*\|'

Length of output: 9643


Looking at the script output, I need to verify when the llm-proxy-client-configmap.yaml.j2 template is rendered and whether it's conditionally protected.


🏁 Script executed:

#!/bin/bash
# Find where llm-proxy-client-configmap.yaml.j2 is referenced/used
echo "=== Searching for llm-proxy-client-configmap template references ==="
rg -nC5 'llm-proxy-client-configmap'

echo -e "\n=== Searching for kai_llm_proxy_enabled definition and usage ==="
rg -nC3 'kai_llm_proxy_enabled'

echo -e "\n=== Checking conditionals in KAI-related role files ==="
fd -e yml -e j2 'kai|llm' roles/tackle/templates/kai/ --exec grep -l 'if kai_llm' {} \;

Length of output: 5586


🏁 Script executed:

#!/bin/bash
# Also check how this template is registered in the role
echo "=== Checking role tasks for template handling ==="
fd -e yml 'main\.yml|tasks' roles/tackle/ --exec grep -l 'llm-proxy-client' {} \;

Length of output: 80


🏁 Script executed:

#!/bin/bash
# Find where this template is used in role tasks
echo "=== Searching for template task that renders llm-proxy-client-configmap ==="
rg -n 'llm-proxy-client-configmap' roles/tackle/tasks/ 

echo -e "\n=== Searching all tasks/main.yml files in tackle role ==="
cat roles/tackle/tasks/main.yml | head -100

echo -e "\n=== Looking for kai_llm_proxy_enabled in tasks ==="
rg -n 'kai_llm_proxy_enabled' roles/tackle/tasks/

Length of output: 3921


🏁 Script executed:

#!/bin/bash
# Search for validation or interdependencies between kai_llm_proxy_enabled and kai_llm_model
echo "=== Looking for validation logic or guards ==="
rg -nC5 'kai_llm_proxy_enabled.*kai_llm_model|kai_llm_model.*kai_llm_proxy_enabled'

echo -e "\n=== Checking if kai_llm_model has other conditional checks ==="
rg -nB3 -A3 'kai_llm_model' roles/tackle/tasks/kai.yml

echo -e "\n=== Searching in entire kai.yml for related context ==="
cat roles/tackle/tasks/kai.yml | grep -A20 'when: kai_llm_proxy_enabled'

Length of output: 796


Add null-check for kai_llm_model to prevent invalid JSON when proxy is enabled without a model specified.

The ConfigMap template uses kai_llm_model without null protection, unlike the related llm-proxy-configmap.yaml.j2 template which wraps the model reference in a {% if kai_llm_model %} check. If kai_llm_proxy_enabled is true but kai_llm_model remains null (from defaults), the rendered JSON becomes invalid: "model": "openai/null".

Add the same null-checking pattern from the sibling template to ensure the ConfigMap is only created when both conditions are met.

Comment on lines +35 to +56
startupProbe:
httpGet:
path: /v1/health
port: 8321
failureThreshold: 30
periodSeconds: 10
initialDelaySeconds: 10
livenessProbe:
httpGet:
path: /v1/health
port: 8321
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /v1/health
port: 8321
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify health endpoint and probe timings.

The deployment uses /v1/health for all probes with the following configuration:

  • Startup: 30 failures × 10s = 5 minutes max startup time
  • Liveness: 30s period, 3 failures before restart
  • Readiness: 10s period

Confirm that:

  1. The /v1/health endpoint exists in the llamastack/distribution-starter:latest image
  2. The 5-minute startup window is sufficient for model initialization
  3. The liveness failure threshold (90 seconds) allows for temporary slow responses without unnecessary restarts

🌐 Web query:

What health check endpoints does llama-stack distribution-starter provide?

💡 Result:

The starter distribution exposes HTTP health endpoints under /v1, specifically:

  • GET /v1/health (generic health). [1]
  • GET /v1/health/ready (readiness). [2]
  • GET /v1/health/live (liveness). [2]

Sources: starter/bring-your-own docs showing /v1/health and the API reference showing /v1/health/ready and /v1/health/live. [1][2]


Use dedicated health check endpoints for readiness and liveness probes.

The /v1/health endpoint exists, but llama-stack distribution-starter also provides dedicated /v1/health/ready and /v1/health/live endpoints. Update the configuration to use:

  • readinessProbe/v1/health/ready
  • livenessProbe/v1/health/live
  • startupProbe/v1/health (or /v1/health/live)

Also verify through deployment testing that:

  • The 5-minute startup window is sufficient for model initialization
  • The 90-second liveness failure threshold (30s period × 3 failures) allows for temporary slow responses
🤖 Prompt for AI Agents
In roles/tackle/templates/kai/llm-proxy-deployment.yaml.j2 around lines 35 to
56, the probes all point to /v1/health; change them so readinessProbe uses
/v1/health/ready, livenessProbe uses /v1/health/live, and keep startupProbe
pointed at /v1/health (or /v1/health/live if preferred), then ensure the
existing timing values reflect desired behavior (confirm startupProbe
initialDelaySeconds and failureThreshold provide ~5 minutes for model init, and
that liveness periodSeconds × failureThreshold (~90s) tolerates transient
slowness) and run deployment tests to validate the timings and endpoints return
expected HTTP success codes.

- name: KAI_URL
value: "{{ kai_url }}"
{% if kai_llm_proxy_enabled %}
- name: LLM_PROXY_CONFIG_MAP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary. The hub endpoint serves the configmap by hame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep missed that on cleanup

- name: TACKLE_HUB_URL
value: "{{ hub_url }}"
{% if kai_llm_proxy_enabled|bool %}
- name: KAI_LLM_PROXY_URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided the new llm proxy service would be mapped in the main rev proxy instead of the brokering through the hub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not setting it in the main reverse proxy? I set this envvar to be used here: https://github.com/konveyor/tackle2-ui/pull/2743/files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My mistake. Thought it was the hub deployment.

Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Defer to @jmontleon regarding general ansible-operator things.

@fabianvf fabianvf merged commit 16dc436 into konveyor:main Dec 1, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LLM proxy to deployed services

2 participants