Skip to content

fix(setup.sh): set Temporal namespace retention on create#2675

Open
kirson-git wants to merge 1 commit into
NVIDIA:mainfrom
kirson-git:fix/temporal-namespace-retention
Open

fix(setup.sh): set Temporal namespace retention on create#2675
kirson-git wants to merge 1 commit into
NVIDIA:mainfrom
kirson-git:fix/temporal-namespace-retention

Conversation

@kirson-git

Copy link
Copy Markdown

Problem

helm-prereqs/setup.sh creates the Temporal namespaces (cloud, site, flow, <site-uuid>) with:

temporal operator namespace create -n <ns> --address ... 2>/dev/null || true

No --retention is passed, and 2>/dev/null || true swallows any failure. Depending on the Temporal server's default-retention policy the create can be rejected/incomplete; the namespace then never exists and the nico-rest cloud/site/flow workers CrashLoop with Namespace "<ns>" is not found. The silent || true makes this invisible during install.

Fix

Pass an explicit --retention 72h on each namespace create so it succeeds deterministically regardless of server defaults. (Reproduced on a clean v0.10.3 install; the same code is still present on main.)

🤖 Generated with Claude Code

The Temporal namespaces (cloud, site, flow, <site-uuid>) are created with
`temporal operator namespace create -n <ns> --address ... 2>/dev/null || true`
with no --retention. Depending on the Temporal server's default-retention
policy the create can be rejected/incomplete, and the `2>/dev/null || true`
hides the failure. The namespace then never exists, and nico-rest cloud/site/
flow workers CrashLoop with 'Namespace "<ns>" is not found'.

Set an explicit --retention (72h) on each namespace create so it succeeds
deterministically regardless of server defaults.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: kirson-git <ekirson@nvidia.com>
@kirson-git kirson-git requested review from a team and shayan1995 as code owners June 17, 2026 14:29
@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • Chores
    • Updated deployment prerequisites script to configure Temporal namespaces with standardized 72-hour retention policies. This change ensures that cloud, site, flow, and per-site Temporal namespace instances consistently retain data for 72 hours, establishing uniform and predictable data lifecycle behavior across all environment configurations during setup.

Walkthrough

The helm-prereqs/setup.sh script is updated to append --retention 72h to all temporal operator namespace create invocations executed via kubectl exec. This affects the cloud, site, and flow namespaces (lines 633–638) and the per-site namespace identified by NICO_SITE_UUID (line 766).

Changes

Temporal Namespace Retention Configuration

Layer / File(s) Summary
Add --retention 72h to all Temporal namespace create commands
helm-prereqs/setup.sh
Appends --retention 72h to the temporal operator namespace create command strings for the cloud, site, flow, and NICO_SITE_UUID namespaces within kubectl exec invocations.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding explicit Temporal namespace retention policy on creation.
Description check ✅ Passed The description comprehensively explains the problem (missing retention policy causing silent failures), the fix (adding --retention 72h), and provides context from a clean installation test.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm-prereqs/setup.sh`:
- Around line 632-638: The kubectl exec commands for temporal namespace creation
(lines 633, 635, 638, and line 766) are unconditionally suppressing all stderr
output and errors with `2>/dev/null || true`, which masks real setup failures
like TLS, authentication, or connectivity issues. Instead of blanket error
suppression, implement idempotent namespace creation by first checking if each
namespace already exists before attempting to create it, or by using a
conditional approach that only ignores "already exists" errors while allowing
actual failures to be logged and propagated. This maintains idempotency while
preserving fail-fast behavior and enabling clear error diagnostics as required
by the helm-prereqs guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a21fe10a-5f06-450e-92fd-0775dc5e8e04

📥 Commits

Reviewing files that changed from the base of the PR and between bd899c6 and d199af0.

📒 Files selected for processing (1)
  • helm-prereqs/setup.sh

Comment thread helm-prereqs/setup.sh
Comment on lines 632 to +638
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n cloud --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n site --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
# flow Temporal namespace — required by NICo Flow workers; pod panics on startup if absent.
kubectl exec -n temporal deploy/temporal-admintools -- \
sh -c "temporal operator namespace create -n flow --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not suppress all namespace-creation failures.

--retention 72h is a good fix, but Line 633, Line 635, Line 638, and Line 766 still unconditionally hide failures (2>/dev/null || true). This can mask real setup faults (TLS/auth/connectivity/CLI errors) and breaks fail-fast behavior.

Suggested hardening (idempotent + fail-fast)
+create_temporal_namespace() {
+    local ns="$1"
+    if kubectl exec -n temporal deploy/temporal-admintools -- \
+        sh -c "temporal operator namespace describe -n '${ns}' --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" \
+        >/dev/null 2>&1; then
+        echo "Temporal namespace '${ns}' already exists"
+        return 0
+    fi
+
+    kubectl exec -n temporal deploy/temporal-admintools -- \
+        sh -c "temporal operator namespace create -n '${ns}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}"
+}
+
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "cloud"
+create_temporal_namespace "site"
+create_temporal_namespace "flow"
...
-kubectl exec -n temporal deploy/temporal-admintools -- \
-    sh -c "temporal operator namespace create -n '${NICO_SITE_UUID}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "${NICO_SITE_UUID}"

As per coding guidelines, "**/*.sh: Review shell scripts for ... error propagation ..." and "helm-prereqs/**: ... idempotency, and clear failure messages."

Also applies to: 765-766

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@helm-prereqs/setup.sh` around lines 632 - 638, The kubectl exec commands for
temporal namespace creation (lines 633, 635, 638, and line 766) are
unconditionally suppressing all stderr output and errors with `2>/dev/null ||
true`, which masks real setup failures like TLS, authentication, or connectivity
issues. Instead of blanket error suppression, implement idempotent namespace
creation by first checking if each namespace already exists before attempting to
create it, or by using a conditional approach that only ignores "already exists"
errors while allowing actual failures to be logged and propagated. This
maintains idempotency while preserving fail-fast behavior and enabling clear
error diagnostics as required by the helm-prereqs guidelines.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant