feat(helm): Add Presto query engine support to the Helm chart. by junhaoliao · Pull Request #2004 · y-scope/clp

junhaoliao · 2026-02-17T20:40:18Z

Description

Add Presto coordinator and worker deployments to the CLP Helm chart, gated by
clpConfig.package.query_engine: "presto". When Presto mode is enabled, the chart deploys a Presto coordinator (single replica) and configurable Presto worker(s) instead of the clp-s query pipeline (query scheduler, query workers, reducers). The api-server is also excluded in Presto mode.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Helm template renders correctly (Presto mode)

Task: Verify the Helm chart renders all Presto resources and excludes query/reducer/api-server
templates when query_engine is "presto".

Command:

cd tools/deployment/package-helm
helm template clp . --set clpConfig.package.query_engine="presto" 2>&1 | grep "^# Source:" | sort

Output:

# Source: clp/templates/compression-scheduler-deployment.yaml
# Source: clp/templates/compression-scheduler-logs-pvc.yaml
# Source: clp/templates/compression-scheduler-logs-pv.yaml
# Source: clp/templates/compression-scheduler-user-logs-pvc.yaml
# Source: clp/templates/compression-scheduler-user-logs-pv.yaml
# Source: clp/templates/compression-worker-deployment.yaml
# Source: clp/templates/compression-worker-logs-pvc.yaml
# Source: clp/templates/compression-worker-logs-pv.yaml
# Source: clp/templates/compression-worker-tmp-pvc.yaml
# Source: clp/templates/compression-worker-tmp-pv.yaml
# Source: clp/templates/configmap.yaml
# Source: clp/templates/database-data-pv.yaml
# Source: clp/templates/database-logs-pv.yaml
# Source: clp/templates/database-secret.yaml
# Source: clp/templates/database-service.yaml
# Source: clp/templates/database-statefulset.yaml
# Source: clp/templates/db-table-creator-job.yaml
# Source: clp/templates/garbage-collector-deployment.yaml
# Source: clp/templates/garbage-collector-logs-pvc.yaml
# Source: clp/templates/garbage-collector-logs-pv.yaml
# Source: clp/templates/job-watcher-rolebinding.yaml
# Source: clp/templates/job-watcher-role.yaml
# Source: clp/templates/job-watcher-serviceaccount.yaml
# Source: clp/templates/presto-coordinator-deployment.yaml
# Source: clp/templates/presto-coordinator-service.yaml
# Source: clp/templates/presto-worker-deployment.yaml
# Source: clp/templates/queue-logs-pv.yaml
# Source: clp/templates/queue-secret.yaml
# Source: clp/templates/queue-service.yaml
# Source: clp/templates/queue-statefulset.yaml
# Source: clp/templates/redis-data-pv.yaml
# Source: clp/templates/redis-logs-pv.yaml
# Source: clp/templates/redis-secret.yaml
# Source: clp/templates/redis-service.yaml
# Source: clp/templates/redis-statefulset.yaml
# Source: clp/templates/results-cache-data-pv.yaml
# Source: clp/templates/results-cache-indices-creator-job.yaml
# Source: clp/templates/results-cache-logs-pv.yaml
# Source: clp/templates/results-cache-service.yaml
# Source: clp/templates/results-cache-statefulset.yaml
# Source: clp/templates/shared-data-archives-pvc.yaml
# Source: clp/templates/shared-data-archives-pv.yaml
# Source: clp/templates/shared-data-streams-pvc.yaml
# Source: clp/templates/shared-data-streams-pv.yaml
# Source: clp/templates/storage-class.yaml
# Source: clp/templates/webui-deployment.yaml
# Source: clp/templates/webui-service.yaml

Explanation: Presto coordinator, service, and worker templates are rendered. Api-server, query scheduler, query worker, and reducer templates are correctly excluded.

2. Helm template renders correctly (clp-s mode)

Task: Verify the clp-s query pipeline resources are rendered when query_engine is "clp-s", and Presto resources are excluded.

Command:

helm template clp . --set clpConfig.package.query_engine="clp-s" 2>&1 | grep "^# Source:" | sort

Output:

# Source: clp/templates/api-server-deployment.yaml
# Source: clp/templates/api-server-logs-pvc.yaml
# Source: clp/templates/api-server-logs-pv.yaml
# Source: clp/templates/api-server-service.yaml
...
# Source: clp/templates/query-scheduler-deployment.yaml
# Source: clp/templates/query-scheduler-logs-pvc.yaml
# Source: clp/templates/query-scheduler-logs-pv.yaml
# Source: clp/templates/query-scheduler-service.yaml
# Source: clp/templates/query-worker-deployment.yaml
# Source: clp/templates/query-worker-logs-pvc.yaml
# Source: clp/templates/query-worker-logs-pv.yaml
...
# Source: clp/templates/reducer-deployment.yaml
# Source: clp/templates/reducer-logs-pvc.yaml
# Source: clp/templates/reducer-logs-pv.yaml
# Source: clp/templates/reducer-service.yaml
# Source: clp/templates/results-cache-data-pv.yaml
# Source: clp/templates/results-cache-indices-creator-job.yaml
# Source: clp/templates/results-cache-logs-pv.yaml
# Source: clp/templates/results-cache-service.yaml
# Source: clp/templates/results-cache-statefulset.yaml
...

Explanation: Api-server, query scheduler, query worker, reducer, and results cache templates are rendered. Presto templates are correctly excluded. (Shared infrastructure templates omitted for brevity.)

3. Helm template renders correctly using the docs' presto-values.yaml

Task: Verify that a presto-values.yaml file created by following the guides-using-presto.md documentation renders without errors.

presto-values.yaml (following the docs, with placeholders replaced):

clpConfig:
  package:
    query_engine: "presto"

  results_cache:
    retention_period: null

  presto:
    split_filter:
      clp.default.default:
        - columnName: "timestamp"
          customOptions:
            rangeMapping:
              lowerBound: "begin_timestamp"
              upperBound: "end_timestamp"
          required: false

Command:

helm template clp . -f /tmp/presto-values.yaml 2>&1 | grep "^# Source:" | sort

Output:

# Source: clp/templates/compression-scheduler-deployment.yaml
...
# Source: clp/templates/presto-coordinator-deployment.yaml
# Source: clp/templates/presto-coordinator-service.yaml
# Source: clp/templates/presto-worker-deployment.yaml
...
# Source: clp/templates/webui-deployment.yaml
# Source: clp/templates/webui-service.yaml

Explanation: The template renders without errors using the exact values file from the docs
(with placeholders filled in). The garbage-collector is correctly excluded because
results_cache.retention_period is set to null and archive_output.retention_period defaults to null, so there is nothing to garbage-collect.

4. Multi-node cluster: clp-s mode (default)

Task: Verify that the Helm chart deploys correctly in clp-s mode on a multi-node kind cluster with dedicated worker nodes.

Command:

cd tools/deployment/package-helm
bash set-up-multi-dedicated-test.sh

Output (final pod status):

NAME                                              READY   STATUS      RESTARTS   AGE
test-clp-api-server-bbcc6684-nsx5k                1/1     Running     0          2m43s
test-clp-compression-scheduler-5c74dfd974-vtdk8   1/1     Running     0          2m43s
test-clp-compression-worker-57c58768b-k8gs5       1/1     Running     0          2m43s
test-clp-compression-worker-57c58768b-kgx4x       1/1     Running     0          2m43s
test-clp-database-0                               1/1     Running     0          2m43s
test-clp-db-table-creator-77qv6                   0/1     Completed   0          2m43s
test-clp-garbage-collector-595f847dc-gqqsw        1/1     Running     0          2m43s
test-clp-query-scheduler-57879c8c7b-lw2fj         1/1     Running     0          2m43s
test-clp-query-worker-7fc9f65988-flmrm            1/1     Running     0          2m43s
test-clp-query-worker-7fc9f65988-ndp9x            1/1     Running     0          2m43s
test-clp-queue-0                                  1/1     Running     0          2m43s
test-clp-redis-0                                  1/1     Running     0          2m43s
test-clp-reducer-664c8545c8-dwhgm                 1/1     Running     0          2m43s
test-clp-reducer-664c8545c8-qzs9q                 1/1     Running     0          2m43s
test-clp-results-cache-0                          1/1     Running     0          2m43s
test-clp-results-cache-indices-creator-8wdmk      0/1     Completed   0          2m43s
test-clp-webui-dc49878f7-nfdpl                    1/1     Running     0          2m43s

Explanation: All clp-s pipeline pods are running (api-server, query-scheduler, 2x query-worker, 2x reducer, results-cache, etc.). No Presto pods are present.

Node scheduling verification:

=== Pod scheduling ===
test-clp-compression-worker-57c58768b-k8gs5   clp-test-worker    (compression node)
test-clp-compression-worker-57c58768b-kgx4x   clp-test-worker    (compression node)
test-clp-query-worker-7fc9f65988-flmrm        clp-test-worker4   (query node)
test-clp-query-worker-7fc9f65988-ndp9x        clp-test-worker4   (query node)

Compression workers are scheduled on compression-labeled nodes and query workers on query-labeled
nodes.

5. Multi-node cluster: Presto mode

Task: Verify that the Helm chart deploys correctly in Presto mode on a multi-node kind cluster.
The test script was temporarily modified to add -f /tmp/presto-values.yaml to the helm install
command and remove the query worker/reducer --set flags (since those components are not deployed in
Presto mode).

Command:

# Temporarily modified set-up-multi-dedicated-test.sh helm install to:
#   helm install test "${script_dir}" \
#       -f /tmp/presto-values.yaml \
#       --set "distributedDeployment=true" \
#       ...
bash set-up-multi-dedicated-test.sh

Output (final pod status):

NAME                                              READY   STATUS      RESTARTS   AGE
test-clp-compression-scheduler-5c74dfd974-jvlq8   1/1     Running     0          2m28s
test-clp-compression-worker-57c58768b-d26nl       1/1     Running     0          2m28s
test-clp-compression-worker-57c58768b-pqwvh       1/1     Running     0          2m28s
test-clp-database-0                               1/1     Running     0          2m28s
test-clp-db-table-creator-kz8dv                   0/1     Completed   0          2m28s
test-clp-presto-coordinator-77897595b5-jns8j      1/1     Running     0          2m28s
test-clp-presto-worker-794b756d5d-bs5hs           1/1     Running     0          2m28s
test-clp-queue-0                                  1/1     Running     0          2m28s
test-clp-redis-0                                  1/1     Running     0          2m28s
test-clp-results-cache-0                          1/1     Running     0          2m28s
test-clp-results-cache-indices-creator-dm4p8      0/1     Completed   0          2m28s
test-clp-webui-dc49878f7-nzx7h                    1/1     Running     0          2m28s
All jobs completed and services are ready.

Explanation: Presto coordinator and worker pods are running. Api-server, query-scheduler,
query-worker, and reducer pods are correctly absent. The Presto worker init container successfully detected the coordinator version and initialized its config.

Presto pod verification:

=== Presto pods running ===
test-clp-presto-coordinator-77897595b5-jns8j      1/1     Running
test-clp-presto-worker-794b756d5d-bs5hs           1/1     Running

=== Query/reducer pods (should be NONE) ===
(none - correct)

=== Api-server pods (should be NONE) ===
(none - correct)

Compressed sample logs postgresql.jsonl , performed queries in the webui and observed results were correctly returned.

Summary by CodeRabbit

New Features
- Added Presto as an alternative query engine with coordinator/worker components and configurable worker scheduling and storage options.
Documentation
- Added comprehensive Presto deployment guides for Kubernetes (Helm) and Docker Compose, including example values, deployment steps, scheduling guidance and configuration references.

coderabbitai · 2026-02-17T20:40:44Z

Walkthrough

Adds Presto support to the Helm deployment: new Presto coordinator/worker templates, Presto-specific config and values, and conditional gating that disables default query-scheduler/worker/reducer (and related PV/PVC/Service) when .Values.clpConfig.package.query_engine == "presto". Documentation for Kubernetes and Docker Compose is added.

Changes

Cohort / File(s)	Summary
Documentation `docs/src/user-docs/guides-k8s-deployment.md`, `docs/src/user-docs/guides-using-presto.md`	Adds Presto deployment guidance for Kubernetes (Helm) and Docker Compose, example values (`presto-values.yaml`), scheduling notes, anchors and cross-references.
Helm: API Server `tools/deployment/package-helm/templates/api-server-deployment.yaml`, `tools/deployment/package-helm/templates/api-server-logs-pv.yaml`, `tools/deployment/package-helm/templates/api-server-logs-pvc.yaml`, `tools/deployment/package-helm/templates/api-server-service.yaml`	Adds conditional checks to only render API server resources when `clpConfig.api_server` is true and `package.query_engine != "presto"`.
Helm: Query Scheduler `tools/deployment/package-helm/templates/query-scheduler-deployment.yaml`, `.../query-scheduler-logs-pv.yaml`, `.../query-scheduler-logs-pvc.yaml`, `.../query-scheduler-service.yaml`	Wraps scheduler Deployment, PV, PVC and Service in conditionals so they are omitted when query_engine == "presto".
Helm: Query Worker & Storage `tools/deployment/package-helm/templates/query-worker-deployment.yaml`, `.../query-worker-logs-pv.yaml`, `.../query-worker-logs-pvc.yaml`, `.../query-worker-staged-streams-pv.yaml`, `.../query-worker-staged-streams-pvc.yaml`	Guards worker Deployment and related PV/PVCs behind `query_engine != "presto"`; staged-streams PV creation call augmented with explicit metadata fields.
Helm: Reducer `tools/deployment/package-helm/templates/reducer-deployment.yaml`, `.../reducer-logs-pv.yaml`, `.../reducer-logs-pvc.yaml`, `.../reducer-service.yaml`	Adds conditional wrapper to omit reducer Deployment, PV, PVC and Service when using Presto.
Helm: Presto resources `tools/deployment/package-helm/templates/presto-coordinator-deployment.yaml`, `.../presto-coordinator-service.yaml`, `.../presto-worker-deployment.yaml`	Introduces new templates for Presto coordinator and worker (deployments, service); includes init containers, readiness/liveness probes, configMap/volume mounts and optional archive volume handling.
Helm: Config & Values `tools/deployment/package-helm/templates/configmap.yaml`, `tools/deployment/package-helm/values.yaml`, `tools/deployment/package-helm/Chart.yaml`	Adds Presto blocks to the ConfigMap (coordinator/worker, catalog, JVM, node properties, split-filter) gated by `package.query_engine == "presto"`; adds Presto image entries, prestoWorker values, memory settings and bumps chart version.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Helm as Helm chart renderer
participant Values as Values.yaml
participant ConfigMap as clp-config ConfigMap
participant PrestoCoord as Presto Coordinator (Deployment + Service)
participant PrestoWorker as Presto Worker (Deployment)
participant Legacy as Default Scheduler/Worker/Reducer

Values->>Helm: set package.query_engine="presto"
Helm->>ConfigMap: render Presto config (catalog, JVM, node props)
Helm->>PrestoCoord: create Deployment + Service (presto-coordinator)
PrestoWorker->>PrestoCoord: init waits for coordinator endpoint
Helm->>PrestoWorker: create Presto worker Deployment (setup init containers, mount configs)
Helm->>Legacy: do not render query-scheduler/worker/reducer resources (conditional skip)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding Presto query engine support to the Helm chart, which is the primary focus of all modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/user-docs/guides-k8s-deployment.md`:
- Around line 359-362: Update the note text in the docs to correct the list of
components replaced when `query_engine` = "presto": remove "results cache" from
the excluded list and add "api-server"; reference the Helm gating logic when
editing the sentence so it reads that Presto replaces the "query scheduler,
query workers, reducers, and api-server" (since `query_scheduler`,
`query_workers`, and `reducers` are gated by `{{- if ne
.Values.clpConfig.package.query_engine \"presto\" }}` and the `api-server`
deployment is gated by `{{- if and .Values.clpConfig.api_server (ne
.Values.clpConfig.package.query_engine \"presto\") }}`), while `results-cache-*`
manifests are not conditionally removed.
- Around line 311-315: The prestoCoordinator and prestoWorker image definitions
currently use mutable tag values (tag: "dev"); update these to pinned/stable
release tags (e.g., tag: "vX.Y.Z") for both prestoCoordinator and prestoWorker
so deployments pull a fixed image; locate the tag fields under the
prestoCoordinator and prestoWorker blocks and replace "dev" with the chosen
semantic version string (or a chart value reference like {{
.Values.image.prestoCoordinator.tag }} and {{ .Values.image.prestoWorker.tag }}
if you prefer templating) to ensure reproducible deployments.

In `@docs/src/user-docs/guides-using-presto.md`:
- Around line 66-70: The docs show only a helm install command which will fail
when CLP is already deployed; update the step under the Presto install to
include the alternative helm upgrade invocation so users can perform an upgrade
when a release exists—specifically add guidance to run `helm upgrade clp clp/clp
DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` as the alternative to `helm
install` (referencing the existing `helm install clp clp/clp
DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` and the `presto-values.yaml`
values file).

In `@tools/deployment/package-helm/templates/configmap.yaml`:
- Around line 369-378: The JVM Xmx in presto-coordinator-config-jvm.config is
hardcoded as "-Xmx4G"; change it to render a Helm value so operators can
configure it: replace the literal "-Xmx4G" with a template that reads
.Values.clpConfig.presto.coordinator.jvm_max_heap (with a sensible default like
"4G"), ensure consumers pass the unit (e.g., "4G") or document it, and add the
new key plus a descriptive default and comment to values.yaml so the chart is
configurable and documented.
- Around line 347-348: The ConfigMap is embedding sensitive values
(clp.metadata-db-user, clp.metadata-db-password, clp.s3-access-key-id,
clp.s3-secret-access-key); remove these keys from templates/configmap.yaml and
instead create a Helm Secret template that stores those values (e.g.,
templates/secret.yaml with .Values.credentials.database and
.Values.credentials.s3 entries), then update consumers to read them from the
Secret (use env valueFrom.secretKeyRef, secret projection into a volume, or
template lookup/include to inject non-plain text values) so credentials are
stored as Kubernetes Secrets rather than literal ConfigMap entries.
- Line 397: The ConfigMap currently hardcodes AWS hostnames for the Presto S3
path via the clp.s3-end-point entry using {{ .s3_config.bucket }} and {{
.s3_config.region_code }}, which breaks S3-compatible stores; change the
template to prefer the configured endpoint_url on .s3_config (e.g.,
archive_output.storage.s3_config.endpoint_url / .s3_config.endpoint_url) when
present and fall back to the existing AWS-format URL only if endpoint_url is
empty, so Presto uses the explicit endpoint for MinIO/other S3-compatible
services.

In `@tools/deployment/package-helm/templates/presto-coordinator-deployment.yaml`:
- Around line 33-63: The presto-coordinator container lacks Kubernetes resource
requests/limits; add a configurable resources block for the container named
"presto-coordinator" (e.g., read values from a new or existing Helm key such as
.Values.prestoCoordinator.resources or .Values.resources.prestoCoordinator) and
populate both requests and limits for cpu and memory so they can be tuned
per-environment; update the chart values.yaml to include sensible defaults and
reference those values in the presto-coordinator deployment template.

In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml`:
- Around line 62-93: Add Kubernetes resource requests and limits to the
presto-worker container (container name "presto-worker") so CPU/memory are
enforced; update the deployment template to include a resources: block under
that container using Helm values (e.g. reference values like
.Values.resources.prestoWorker.requests.cpu,
.Values.resources.prestoWorker.requests.memory,
.Values.resources.prestoWorker.limits.cpu,
.Values.resources.prestoWorker.limits.memory or similar keys used elsewhere) and
ensure the memory request/limit aligns reasonably with the Presto config
(system_memory_gb/query_memory_gb) to avoid OOMKills; keep the block next to
ports/volumeMounts in the "presto-worker" container spec.
- Around line 38-55: The startup script that sets DISCOVERY_URI, calls wget into
VERSION and extracts PRESTO_VERSION can produce an empty PRESTO_VERSION (e.g.,
coordinator not ready) and currently would write a blank presto.version; update
the boot script (the block using set -e, DISCOVERY_URI, VERSION, PRESTO_VERSION)
to validate the coordinator response: implement a short retry loop with a
timeout that retries wget on ${DISCOVERY_URI}/v1/info a few times (or at least
reject empty output), extract PRESTO_VERSION only after non-empty VERSION, and
if PRESTO_VERSION is still empty log an error and exit non-zero (do not append a
blank presto.version); ensure the script does not proceed to echo
"presto.version=${PRESTO_VERSION}" unless PRESTO_VERSION is non-empty.
- Around line 69-86: The volume mount ordering for the Presto worker container
is subtle: the emptyDir mount named "presto-etc" at /opt/presto-server/etc must
be mounted before the child mounts ("presto-catalog" at
/opt/presto-server/etc/catalog and "presto-config" via subPath at
/opt/presto-server/etc/velox.properties) so init-container writes to the
emptyDir (config.properties, node.properties) remain visible; add a clear inline
comment near the volumeMounts block explaining this dependency and that
Kubernetes mounts parents before children (so do not reorder or replace
"presto-etc" with a deeper mount), referencing the mount names presto-etc,
presto-catalog, presto-config and the target path /opt/presto-server/etc.
- Around line 33-55: The init container "setup-configs" is using an unpinned
busybox image and a brittle sed-based JSON parse for PRESTO_VERSION; change the
image (in the "setup-configs" container spec) from busybox:latest to a pinned
image that includes jq (or make the tag configurable via values.yaml), and
replace the sed pipeline that sets PRESTO_VERSION (the commands that assign
VERSION and PRESTO_VERSION) to use a robust JSON parser like jq (e.g., fetch
DISCOVERY_URI "/v1/info" and pipe to jq -r .version) and add a simple check to
fail or default if PRESTO_VERSION is empty so the echo "presto.version=..." step
is reliable.

In `@tools/deployment/package-helm/values.yaml`:
- Around line 20-27: The Helm values set mutable dev tags for Presto images:
update prestoCoordinator.tag and prestoWorker.tag to use a stable release tag
convention (e.g., match clpPackage.tag which uses "main" or a semantic version)
so defaults are deterministic; modify the values for prestoCoordinator and
prestoWorker to a stable tag string and document in values.yaml comment that
operators should override tag for development workflows, ensuring consistency
across clpPackage, prestoCoordinator, and prestoWorker.

coderabbitai · 2026-02-17T20:48:31Z

docs/src/user-docs/guides-k8s-deployment.md

+    repository: "ghcr.io/y-scope/presto/coordinator"
+    tag: "dev"
+  prestoWorker:
+    repository: "ghcr.io/y-scope/presto/prestissimo-worker"
+    tag: "dev"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's run the provided shell script to check Helm chart values fd "values.yaml" tools/deployment/package-helm \ --exec grep -n "prestoCoordinator\|prestoWorker\|presto.*tag\|tag.*presto" {}

Repository: y-scope/clp

Length of output: 114

🏁 Script executed:

# Let's also check the full context around the documentation lines mentioned cd docs/src/user-docs cat -n guides-k8s-deployment.md | sed -n '300,330p'

Repository: y-scope/clp

Length of output: 1087

🏁 Script executed:

# Check if DOCS_VAR_CLP_GIT_REF is used in the documentation grep -n "DOCS_VAR_CLP_GIT_REF" docs/src/user-docs/guides-k8s-deployment.md | head -20

Repository: y-scope/clp

Length of output: 37

🏁 Script executed:

# Let's explore the Helm chart structure find tools/deployment/package-helm -type f -name "*.yaml" -o -name "*.yml" | head -20

Repository: y-scope/clp

Length of output: 1474

🏁 Script executed:

# Read the Helm chart values.yaml to see the actual configuration cd tools/deployment/package-helm cat values.yaml | head -150

Repository: y-scope/clp

Length of output: 4185

🏁 Script executed:

# Check specifically for image tag configurations in values.yaml grep -A 5 -B 5 "prestoCoordinator\|prestoWorker" tools/deployment/package-helm/values.yaml

Repository: y-scope/clp

Length of output: 730

🏁 Script executed:

# Look at other image configurations in the documentation to see how they handle tags grep -n "tag:" docs/src/user-docs/guides-k8s-deployment.md | head -20

Repository: y-scope/clp

Length of output: 114

Replace "dev" image tags with stable version tags for Presto components.

The prestoCoordinator and prestoWorker images use tag: "dev", which is mutable and pulled fresh on every deployment (due to pullPolicy: "Always"). This can lead to unexpected breaking changes between deployments.

Use a stable version tag (e.g., a specific release version) instead of the floating "dev" tag to ensure consistent, predictable deployments. This aligns with the stability provided by other pinned image tags in the Helm chart.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-k8s-deployment.md` around lines 311 - 315, The prestoCoordinator and prestoWorker image definitions currently use mutable tag values (tag: "dev"); update these to pinned/stable release tags (e.g., tag: "vX.Y.Z") for both prestoCoordinator and prestoWorker so deployments pull a fixed image; locate the tag fields under the prestoCoordinator and prestoWorker blocks and replace "dev" with the chosen semantic version string (or a chart value reference like {{ .Values.image.prestoCoordinator.tag }} and {{ .Values.image.prestoWorker.tag }} if you prefer templating) to ensure reproducible deployments.

coderabbitai · 2026-02-17T20:48:31Z

docs/src/user-docs/guides-k8s-deployment.md

+:::{note}
+When `query_engine` is set to `"presto"`, the chart deploys a Presto coordinator and Presto
+worker(s) instead of the query scheduler, query workers, reducers, and results cache.
+:::


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for conditional logic around results_cache, api-server, and other components # in the Helm templates to determine what's excluded when query_engine="presto" echo "=== Searching for results_cache conditional logic ===" rg -n 'results.cache|resultsCache' tools/deployment/package-helm/templates/ \ -A 5 -B 5 --type yaml echo -e "\n=== Searching for api-server conditional logic ===" rg -n 'api.server|apiServer|api_server' tools/deployment/package-helm/templates/ \ -A 5 -B 5 --type yaml echo -e "\n=== Searching for query_engine conditional logic ===" rg -n 'query_engine|queryEngine' tools/deployment/package-helm/templates/ \ -A 5 -B 5 --type yaml

Repository: y-scope/clp

Length of output: 50367

Note is inaccurate: results_cache should not be listed as excluded, and api-server is missing.

The note states Presto replaces "the query scheduler, query workers, reducers, and results cache", but:

The Helm templates show no conditional gating on query_engine for any results-cache-* manifests (service, statefulset, indices-creator job), confirming it remains deployed when Presto is enabled — only with retention disabled.

The api-server deployment is actually gated by {{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}, so it is excluded and should be mentioned.

The query scheduler, query workers, and reducers are all gated by {{- if ne .Values.clpConfig.package.query_engine "presto" }}, confirming they are excluded.

Update the note to list: query scheduler, query workers, reducers, and api-server—not results cache.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-k8s-deployment.md` around lines 359 - 362, Update the note text in the docs to correct the list of components replaced when `query_engine` = "presto": remove "results cache" from the excluded list and add "api-server"; reference the Helm gating logic when editing the sentence so it reads that Presto replaces the "query scheduler, query workers, reducers, and api-server" (since `query_scheduler`, `query_workers`, and `reducers` are gated by `{{- if ne .Values.clpConfig.package.query_engine \"presto\" }}` and the `api-server` deployment is gated by `{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine \"presto\") }}`), while `results-cache-*` manifests are not conditionally removed.

coderabbitai · 2026-02-17T20:48:31Z

docs/src/user-docs/guides-using-presto.md

+2. Install (or upgrade) the Helm chart with the Presto values:
+
+   ```bash
+   helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
+   ```


⚠️ Potential issue | 🟡 Minor

helm install will fail for users with an existing CLP deployment.

The requirement at line 33 explicitly states the user already has "a running CLP Kubernetes deployment," yet step 2 only shows helm install, which would error out with release already exists. The parenthetical "(or upgrade)" acknowledges the upgrade path but never shows the corresponding command.

💡 Suggested fix

-2. Install (or upgrade) the Helm chart with the Presto values: +2. If you haven't installed CLP yet, install it with Presto enabled: ```bash helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml ``` + + If CLP is already installed, upgrade the release instead: + + ```bash + helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml + ```

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

2. Install (or upgrade) the Helm chart with the Presto values:

```bash

helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml

```

2. If you haven't installed CLP yet, install it with Presto enabled:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-using-presto.md` around lines 66 - 70, The docs show only a helm install command which will fail when CLP is already deployed; update the step under the Presto install to include the alternative helm upgrade invocation so users can perform an upgrade when a release exists—specifically add guidance to run `helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` as the alternative to `helm install` (referencing the existing `helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` and the `presto-values.yaml` values file).

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/configmap.yaml

+    clp.metadata-db-user={{ $.Values.credentials.database.username }}
+    clp.metadata-db-password={{ $.Values.credentials.database.password }}


⚠️ Potential issue | 🟠 Major

Credentials stored in plaintext in a ConfigMap — use a Secret instead.

clp.metadata-db-password (and clp.metadata-db-user) are written directly into the ConfigMap. ConfigMaps are stored unencrypted in etcd by default, are readable by any principal with get/list on ConfigMaps in the namespace, and are fully visible in kubectl get configmap -o yaml. Kubernetes Secrets are the appropriate resource for sensitive values: they can be encrypted at rest, carry separate RBAC controls, and are omitted from audit logs by default.

The same concern applies to the S3 credentials at lines 396/398 (clp.s3-access-key-id, clp.s3-secret-access-key).

Consider rendering these files from a Kubernetes Secret (or multiple projected volume mounts) rather than embedding credentials in the ConfigMap.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 347 - 348, The ConfigMap is embedding sensitive values (clp.metadata-db-user, clp.metadata-db-password, clp.s3-access-key-id, clp.s3-secret-access-key); remove these keys from templates/configmap.yaml and instead create a Helm Secret template that stores those values (e.g., templates/secret.yaml with .Values.credentials.database and .Values.credentials.s3 entries), then update consumers to read them from the Secret (use env valueFrom.secretKeyRef, secret projection into a volume, or template lookup/include to inject non-plain text values) so credentials are stored as Kubernetes Secrets rather than literal ConfigMap entries.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/configmap.yaml

+  presto-coordinator-config-jvm.config: |
+    -server
+    -Xmx4G
+    -XX:+UseG1GC
+    -XX:G1HeapRegionSize=32M
+    -XX:+UseGCOverheadLimit
+    -XX:+ExplicitGCInvokesConcurrent
+    -XX:+HeapDumpOnOutOfMemoryError
+    -XX:+ExitOnOutOfMemoryError
+    -Djdk.attach.allowAttachSelf=true


🧹 Nitpick | 🔵 Trivial

Hardcoded -Xmx4G should be a configurable value.

The coordinator JVM heap is fixed at 4 GB regardless of the resource limits/requests set on the coordinator pod. If the pod is scheduled with less memory, it will OOM-kill; if scheduled with more, the heap cannot grow to use it. Expose this as a value (e.g., .Values.clpConfig.presto.coordinator.jvm_max_heap) so operators can tune it alongside query.max-memory and query.max-memory-per-node.

♻️ Suggested approach

presto-coordinator-config-jvm.config: | -server - -Xmx4G + -Xmx{{ .coordinator.jvm_max_heap_gb }}G -XX:+UseG1GC

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 369 - 378, The JVM Xmx in presto-coordinator-config-jvm.config is hardcoded as "-Xmx4G"; change it to render a Helm value so operators can configure it: replace the literal "-Xmx4G" with a template that reads .Values.clpConfig.presto.coordinator.jvm_max_heap (with a sensible default like "4G"), ensure consumers pass the unit (e.g., "4G") or document it, and add the new key plus a descriptive default and comment to values.yaml so the chart is configurable and documented.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/presto-worker-deployment.yaml

+        - name: "setup-configs"
+          image: "busybox:latest"
+          command:
+            - "/bin/sh"
+            - "-c"
+            - |
+              set -e
+              cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties
+              cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties
+
+              # Query coordinator for version and add it to config
+              DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"
+              VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")
+              PRESTO_VERSION=$(echo "$VERSION" \
+                | sed 's/.*"version":"//' \
+                | sed 's/".*//')
+              echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties
+
+              # Set node identity from hostname
+              HOSTNAME=$(hostname)
+              HOST_IP=$(hostname -i)
+              echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties
+              echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties


⚠️ Potential issue | 🟠 Major

Fragile JSON parsing with sed and unpinned busybox:latest image.

Two concerns in this init container:

busybox:latest — Using the latest tag hurts reproducibility and can introduce unexpected breakages. Pin to a specific version (e.g., busybox:1.37), or make it configurable via values.

JSON parsing via sed (Lines 46–48) — Extracting version from a JSON response using sed 's/.*"version":"//' | sed 's/".*//' is brittle. If the JSON structure changes (e.g., field ordering, nested version keys, or whitespace variations), parsing will silently produce incorrect results. Since busybox doesn't include jq, consider switching to an image that has jq, or using a more robust awk/grep pattern.

Proposed fix for image tag pinning and safer parsing

- name: "setup-configs" - image: "busybox:latest" + image: "busybox:1.37" command: - "/bin/sh" - "-c" - | set -e cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties # Query coordinator for version and add it to config DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") - PRESTO_VERSION=$(echo "$VERSION" \ - | sed 's/.*"version":"//' \ - | sed 's/".*//') + PRESTO_VERSION=$(echo "$VERSION" \ + | grep -o '"version":"[^"]*"' \ + | head -1 \ + | sed 's/"version":"//;s/"//') echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around lines 33 - 55, The init container "setup-configs" is using an unpinned busybox image and a brittle sed-based JSON parse for PRESTO_VERSION; change the image (in the "setup-configs" container spec) from busybox:latest to a pinned image that includes jq (or make the tag configurable via values.yaml), and replace the sed pipeline that sets PRESTO_VERSION (the commands that assign VERSION and PRESTO_VERSION) to use a robust JSON parser like jq (e.g., fetch DISCOVERY_URI "/v1/info" and pipe to jq -r .version) and add a simple check to fail or default if PRESTO_VERSION is empty so the echo "presto.version=..." step is reliable.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/presto-worker-deployment.yaml

+            - |
+              set -e
+              cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties
+              cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties
+
+              # Query coordinator for version and add it to config
+              DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"
+              VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")
+              PRESTO_VERSION=$(echo "$VERSION" \
+                | sed 's/.*"version":"//' \
+                | sed 's/".*//')
+              echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties
+
+              # Set node identity from hostname
+              HOSTNAME=$(hostname)
+              HOST_IP=$(hostname -i)
+              echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties
+              echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties


⚠️ Potential issue | 🟡 Minor

No error handling if the coordinator version query fails.

If wget on Line 45 fails (e.g., coordinator not yet ready, network issue), $VERSION will be empty and $PRESTO_VERSION will be an empty string. This would write presto.version= to the config, likely causing the worker to fail with a confusing error. While set -e is present, wget returning an empty body with HTTP 200 would not trigger it.

Consider adding a validation check:

Proposed validation

VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") PRESTO_VERSION=$(echo "$VERSION" \ | sed 's/.*"version":"//' \ | sed 's/".*//') + if [ -z "$PRESTO_VERSION" ]; then + echo "ERROR: Failed to retrieve Presto version from coordinator" >&2 + exit 1 + fi echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- |

set -e

cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties

cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties

# Query coordinator for version and add it to config

DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"

VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")

PRESTO_VERSION=$(echo "$VERSION" \

| sed 's/.*"version":"//' \

| sed 's/".*//')

echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties

# Set node identity from hostname

HOSTNAME=$(hostname)

HOST_IP=$(hostname -i)

echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties

echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties

- |

set -e

cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties

cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties

# Query coordinator for version and add it to config

DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"

VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")

PRESTO_VERSION=$(echo "$VERSION" \

| sed 's/.*"version":"//' \

| sed 's/".*//')

if [ -z "$PRESTO_VERSION" ]; then

echo "ERROR: Failed to retrieve Presto version from coordinator" >&2

exit 1

fi

echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties

# Set node identity from hostname

HOSTNAME=$(hostname)

HOST_IP=$(hostname -i)

echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties

echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around lines 38 - 55, The startup script that sets DISCOVERY_URI, calls wget into VERSION and extracts PRESTO_VERSION can produce an empty PRESTO_VERSION (e.g., coordinator not ready) and currently would write a blank presto.version; update the boot script (the block using set -e, DISCOVERY_URI, VERSION, PRESTO_VERSION) to validate the coordinator response: implement a short retry loop with a timeout that retries wget on ${DISCOVERY_URI}/v1/info a few times (or at least reject empty output), extract PRESTO_VERSION only after non-empty VERSION, and if PRESTO_VERSION is still empty log an error and exit non-zero (do not append a blank presto.version); ensure the script does not proceed to echo "presto.version=${PRESTO_VERSION}" unless PRESTO_VERSION is non-empty.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/presto-worker-deployment.yaml

+      containers:
+        - name: "presto-worker"
+          image: "{{ .Values.image.prestoWorker.repository }}:{{ .Values.image.prestoWorker.tag }}"
+          imagePullPolicy: "{{ .Values.image.prestoWorker.pullPolicy }}"
+          ports:
+            - name: "presto-worker"
+              containerPort: 8080
+          volumeMounts:
+            - name: "presto-catalog"
+              mountPath: "/opt/presto-server/etc/catalog"
+              readOnly: true
+            - name: "presto-config"
+              mountPath: "/opt/presto-server/etc/velox.properties"
+              subPath: "presto-worker-config-velox.properties"
+              readOnly: true
+            - name: "presto-etc"
+              mountPath: "/opt/presto-server/etc"
+            {{- if eq .Values.clpConfig.archive_output.storage.type "fs" }}
+            - name: {{ include "clp.volumeName" (dict
+                "component_category" "shared-data"
+                "name" "archives"
+              ) | quote }}
+              mountPath: "/var/data/archives"
+              readOnly: true
+            {{- end }}
+          readinessProbe:
+            {{- include "clp.readinessProbeTimings" . | nindent 12 }}
+            tcpSocket: &presto-worker-health-check
+              port: "presto-worker"
+          livenessProbe:
+            {{- include "clp.livenessProbeTimings" . | nindent 12 }}
+            tcpSocket: *presto-worker-health-check


🧹 Nitpick | 🔵 Trivial

No resources (requests/limits) defined on the worker container.

Same concern as the coordinator: the worker container has no CPU/memory resource requests or limits. This is especially important for Presto workers, which can be memory-intensive. The system_memory_gb and query_memory_gb values are set in the Presto config but not enforced at the Kubernetes level, which could lead to OOMKills or overcommitment.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around lines 62 - 93, Add Kubernetes resource requests and limits to the presto-worker container (container name "presto-worker") so CPU/memory are enforced; update the deployment template to include a resources: block under that container using Helm values (e.g. reference values like .Values.resources.prestoWorker.requests.cpu, .Values.resources.prestoWorker.requests.memory, .Values.resources.prestoWorker.limits.cpu, .Values.resources.prestoWorker.limits.memory or similar keys used elsewhere) and ensure the memory request/limit aligns reasonably with the Presto config (system_memory_gb/query_memory_gb) to avoid OOMKills; keep the block next to ports/volumeMounts in the "presto-worker" container spec.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/templates/presto-worker-deployment.yaml

+          volumeMounts:
+            - name: "presto-catalog"
+              mountPath: "/opt/presto-server/etc/catalog"
+              readOnly: true
+            - name: "presto-config"
+              mountPath: "/opt/presto-server/etc/velox.properties"
+              subPath: "presto-worker-config-velox.properties"
+              readOnly: true
+            - name: "presto-etc"
+              mountPath: "/opt/presto-server/etc"
+            {{- if eq .Values.clpConfig.archive_output.storage.type "fs" }}
+            - name: {{ include "clp.volumeName" (dict
+                "component_category" "shared-data"
+                "name" "archives"
+              ) | quote }}
+              mountPath: "/var/data/archives"
+              readOnly: true
+            {{- end }}


🧹 Nitpick | 🔵 Trivial

Volume mount layering at /opt/presto-server/etc — verify correctness.

The main container mounts:

presto-catalog (configMap) at /opt/presto-server/etc/catalog

presto-config (configMap, subPath) at /opt/presto-server/etc/velox.properties

presto-etc (emptyDir) at /opt/presto-server/etc

Kubernetes sorts mounts by path length (parent before child), so the emptyDir is mounted first, then the catalog and velox.properties are overlaid. The init container's writes (config.properties, node.properties) live in the emptyDir and should remain visible. This layering should work, but it's subtle and easy to break during future refactors.

Consider adding a comment in the template explaining the mount ordering dependency.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around lines 69 - 86, The volume mount ordering for the Presto worker container is subtle: the emptyDir mount named "presto-etc" at /opt/presto-server/etc must be mounted before the child mounts ("presto-catalog" at /opt/presto-server/etc/catalog and "presto-config" via subPath at /opt/presto-server/etc/velox.properties) so init-container writes to the emptyDir (config.properties, node.properties) remain visible; add a clear inline comment near the volumeMounts block explaining this dependency and that Kubernetes mounts parents before children (so do not reorder or replace "presto-etc" with a deeper mount), referencing the mount names presto-etc, presto-catalog, presto-config and the target path /opt/presto-server/etc.

coderabbitai · 2026-02-17T20:48:31Z

tools/deployment/package-helm/values.yaml

+  prestoCoordinator:
+    repository: "ghcr.io/y-scope/presto/coordinator"
+    pullPolicy: "Always"
+    tag: "dev"
+  prestoWorker:
+    repository: "ghcr.io/y-scope/presto/prestissimo-worker"
+    pullPolicy: "Always"
+    tag: "dev"


🧹 Nitpick | 🔵 Trivial

Presto image tags default to "dev" — consider aligning with a release tag convention.

The clpPackage image defaults to "main", while both Presto images default to "dev". This is fine for development but worth noting that users deploying from the defaults will pull a mutable, potentially unstable tag.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/values.yaml` around lines 20 - 27, The Helm values set mutable dev tags for Presto images: update prestoCoordinator.tag and prestoWorker.tag to use a stable release tag convention (e.g., match clpPackage.tag which uses "main" or a semantic version) so defaults are deterministic; modify the values for prestoCoordinator and prestoWorker to a stable tag string and document in values.yaml comment that operators should override tag for development workflows, ensuring consistency across clpPackage, prestoCoordinator, and prestoWorker.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/deployment/package-helm/Chart.yaml`:
- Line 3: The Chart.yaml pre-release version was mistakenly advanced to
"0.1.4-dev.4", skipping "0.1.4-dev.3"; update the version field in Chart.yaml
from "0.1.4-dev.4" to "0.1.4-dev.3" so the pre-release sequence remains
contiguous (look for the existing version: "0.1.4-dev.4" and replace it with
"0.1.4-dev.3").

coderabbitai · 2026-02-17T20:53:04Z

tools/deployment/package-helm/Chart.yaml

 apiVersion: "v2"
 name: "clp"
-version: "0.1.4-dev.2"
+version: "0.1.4-dev.4"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether 0.1.4-dev.3 is used in any branch or tag git grep "0.1.4-dev.3" $(git for-each-ref --format='%(refname)' refs/heads refs/remotes refs/tags) -- "tools/deployment/package-helm/Chart.yaml" 2>/dev/null || echo "Not found on any ref"

Repository: y-scope/clp

Length of output: 75

Change version to 0.1.4-dev.3 to maintain sequential pre-release versioning.

The version skips from 0.1.4-dev.2 directly to 0.1.4-dev.4, but 0.1.4-dev.3 does not exist on any branch or tag in the repository. This skip appears unintentional and breaks the contiguous pre-release sequence.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/Chart.yaml` at line 3, The Chart.yaml pre-release version was mistakenly advanced to "0.1.4-dev.4", skipping "0.1.4-dev.3"; update the version field in Chart.yaml from "0.1.4-dev.4" to "0.1.4-dev.3" so the pre-release sequence remains contiguous (look for the existing version: "0.1.4-dev.4" and replace it with "0.1.4-dev.3").

hoophalab · 2026-02-18T09:02:50Z

tools/deployment/package-helm/templates/api-server-deployment.yaml

@@ -1,4 +1,4 @@
-{{- if .Values.clpConfig.api_server }}
+{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}


why do we disable api server? Shouldn't we also disable query scheduler and worker if clp search is not needed?

currently, the API server does not work with Presto.

we do have this specified in tools/deployment/package-helm/templates/query-scheduler-deployment.yaml as well

{{- if ne .Values.clpConfig.package.query_engine "presto" }}

though, given some users may want to use both the API Server the Presto, i think we may discard .Values.clpConfig.package.query_engine values and simply check whether query_scheduler / query_worker / API is null before launching the services. what do you think?

cc @kirkrodrigues for discussion since this is interface related

hoophalab · 2026-02-18T09:23:19Z

tools/deployment/package-helm/values.yaml

@@ -17,6 +17,14 @@ image:
    repository: "ghcr.io/y-scope/clp/clp-package"


I'm getting Presto internal error, possibly due to the recent timestamp changes f82e611. Can you reproduce the error?

🤦 right. i can reproduce once i merge main into the current branch.

@coderabbitai note #1788 is a BREAKING change that broke CLP Presto, as in the CLP Presto Connector cannot handle the new Timestamp column type at the moment. submit an issue to track

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Get details about PR `#1788` to understand the breaking change gh pr view 1788 --json title,url,mergedAt,body --repo y-scope/clp # Create an issue to track the breaking change gh issue create \ --repo y-scope/clp \ --title "Presto Connector cannot handle new Timestamp column type from PR `#1788`" \ --body "## Issue Description The CLP Presto Connector currently cannot handle the new Timestamp column type introduced in PR `#1788`, causing internal errors when querying with Presto. ## Context - **Breaking PR**: `#1788` (commit f82e6114160a6addd4727259906bcf621ac9912c) - **Affected Component**: CLP Presto Connector - **Impact**: Presto queries fail with internal errors The issue was discovered during testing of PR `#2004` (Presto Helm chart support). The error can be reproduced after merging main into the Presto feature branch. ## References - PR `#1788`: https://github.com/y-scope/clp/pull/1788 - PR `#2004`: https://github.com/y-scope/clp/pull/2004 - Comment: https://github.com/y-scope/clp/pull/2004#discussion_r2821243815 Reported by: `@junhaoliao`" \ --assignee junhaoliao

Length of output: 5337

@junhaoliao, I've created issue #2008 to track this breaking change. The issue documents that PR #1788 (which introduced the new Timestamp column type to replace DateString and bumped the archive version to 0.5.0) broke the CLP Presto Connector's ability to handle the new column type. The issue has been assigned to you and includes references to both PRs and the relevant commit.

🐰✨
/_/
( o.o )

^ <
"Issue tracked!"

🧠 Learnings used

Learnt from: junhaoliao Repo: y-scope/clp PR: 1818 File: tools/deployment/package-helm/templates/configmap.yaml:12-12 Timestamp: 2025-12-19T05:03:42.629Z Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

hoophalab · 2026-02-18T09:29:37Z

tools/deployment/package-helm/templates/presto-worker-deployment.yaml

+            - "/bin/sh"
+            - "-c"
+            - |
+              set -e
+              cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties
+              cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties
+
+              # Query coordinator for version and add it to config
+              DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"
+              VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")
+              PRESTO_VERSION=$(echo "$VERSION" \
+                | sed 's/.*"version":"//' \
+                | sed 's/".*//')
+              echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties
+
+              # Set node identity from hostname
+              HOSTNAME=$(hostname)
+              HOST_IP=$(hostname -i)
+              echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties
+              echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties


This script looks scary. Does it mimic the functionality of tools/deployment/presto-clp/worker/scripts/generate-configs.sh? I guess we don't use the script directly because of missing dependencies?

- Replace `query_engine` conditions with modular flags in Helm templates. - Refactor Presto configurations to use dedicated flags and scripts. - Enhance worker and reducer node scheduling logic. - Update documentation to reflect Presto integration and better configuration flexibility.

junhaoliao · 2026-03-03T17:26:52Z

TODO: address #1957 since we're updating the Presto setup docs in this PR.

junhaoliao added 2 commits February 18, 2026 03:55

feat(helm): Add Presto query engine support to the Helm chart.

57be8f7

docs: Remove redundant worker configuration comment in Presto guide

96e9a45

junhaoliao requested a review from a team as a code owner February 17, 2026 20:40

chore(helm): Bump chart version to 0.1.4-dev.4

8089254

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

junhaoliao requested review from 20001020ycx and hoophalab February 17, 2026 20:53

Merge branch 'main' into k8s-presto

d389542

hoophalab reviewed Feb 18, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Feb 18, 2026

Presto Connector cannot handle new Timestamp column type from PR #1788 #2008

Closed

junhaoliao added this to the March 2026 milestone Mar 7, 2026

		clp.metadata-db-user={{ $.Values.credentials.database.username }}
		clp.metadata-db-password={{ $.Values.credentials.database.password }}

		@@ -1,4 +1,4 @@
		{{- if .Values.clpConfig.api_server }}
		{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}

		@@ -17,6 +17,14 @@ image:
		repository: "ghcr.io/y-scope/clp/clp-package"

Conversation

junhaoliao commented Feb 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

1. Helm template renders correctly (Presto mode)

2. Helm template renders correctly (clp-s mode)

3. Helm template renders correctly using the docs' presto-values.yaml

4. Multi-node cluster: clp-s mode (default)

5. Multi-node cluster: Presto mode

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

hoophalab Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junhaoliao commented Mar 3, 2026

Uh oh!

junhaoliao commented Feb 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

hoophalab Feb 18, 2026 •

edited

Loading