fix(setup.sh): set Temporal namespace retention on create#2675
fix(setup.sh): set Temporal namespace retention on create#2675kirson-git wants to merge 1 commit into
Conversation
The Temporal namespaces (cloud, site, flow, <site-uuid>) are created with `temporal operator namespace create -n <ns> --address ... 2>/dev/null || true` with no --retention. Depending on the Temporal server's default-retention policy the create can be rejected/incomplete, and the `2>/dev/null || true` hides the failure. The namespace then never exists, and nico-rest cloud/site/ flow workers CrashLoop with 'Namespace "<ns>" is not found'. Set an explicit --retention (72h) on each namespace create so it succeeds deterministically regardless of server defaults. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: kirson-git <ekirson@nvidia.com>
Summary by CodeRabbit
WalkthroughThe ChangesTemporal Namespace Retention Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@helm-prereqs/setup.sh`:
- Around line 632-638: The kubectl exec commands for temporal namespace creation
(lines 633, 635, 638, and line 766) are unconditionally suppressing all stderr
output and errors with `2>/dev/null || true`, which masks real setup failures
like TLS, authentication, or connectivity issues. Instead of blanket error
suppression, implement idempotent namespace creation by first checking if each
namespace already exists before attempting to create it, or by using a
conditional approach that only ignores "already exists" errors while allowing
actual failures to be logged and propagated. This maintains idempotency while
preserving fail-fast behavior and enabling clear error diagnostics as required
by the helm-prereqs guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a21fe10a-5f06-450e-92fd-0775dc5e8e04
📒 Files selected for processing (1)
helm-prereqs/setup.sh
| kubectl exec -n temporal deploy/temporal-admintools -- \ | ||
| sh -c "temporal operator namespace create -n cloud --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true | ||
| sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true | ||
| kubectl exec -n temporal deploy/temporal-admintools -- \ | ||
| sh -c "temporal operator namespace create -n site --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true | ||
| sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true | ||
| # flow Temporal namespace — required by NICo Flow workers; pod panics on startup if absent. | ||
| kubectl exec -n temporal deploy/temporal-admintools -- \ | ||
| sh -c "temporal operator namespace create -n flow --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true | ||
| sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true |
There was a problem hiding this comment.
Do not suppress all namespace-creation failures.
--retention 72h is a good fix, but Line 633, Line 635, Line 638, and Line 766 still unconditionally hide failures (2>/dev/null || true). This can mask real setup faults (TLS/auth/connectivity/CLI errors) and breaks fail-fast behavior.
Suggested hardening (idempotent + fail-fast)
+create_temporal_namespace() {
+ local ns="$1"
+ if kubectl exec -n temporal deploy/temporal-admintools -- \
+ sh -c "temporal operator namespace describe -n '${ns}' --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" \
+ >/dev/null 2>&1; then
+ echo "Temporal namespace '${ns}' already exists"
+ return 0
+ fi
+
+ kubectl exec -n temporal deploy/temporal-admintools -- \
+ sh -c "temporal operator namespace create -n '${ns}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}"
+}
+
-kubectl exec -n temporal deploy/temporal-admintools -- \
- sh -c "temporal operator namespace create -n cloud --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
- sh -c "temporal operator namespace create -n site --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
-kubectl exec -n temporal deploy/temporal-admintools -- \
- sh -c "temporal operator namespace create -n flow --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "cloud"
+create_temporal_namespace "site"
+create_temporal_namespace "flow"
...
-kubectl exec -n temporal deploy/temporal-admintools -- \
- sh -c "temporal operator namespace create -n '${NICO_SITE_UUID}' --retention 72h --address ${_TEMPORAL_ADDR} ${_TEMPORAL_TLS}" 2>/dev/null || true
+create_temporal_namespace "${NICO_SITE_UUID}"As per coding guidelines, "**/*.sh: Review shell scripts for ... error propagation ..." and "helm-prereqs/**: ... idempotency, and clear failure messages."
Also applies to: 765-766
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@helm-prereqs/setup.sh` around lines 632 - 638, The kubectl exec commands for
temporal namespace creation (lines 633, 635, 638, and line 766) are
unconditionally suppressing all stderr output and errors with `2>/dev/null ||
true`, which masks real setup failures like TLS, authentication, or connectivity
issues. Instead of blanket error suppression, implement idempotent namespace
creation by first checking if each namespace already exists before attempting to
create it, or by using a conditional approach that only ignores "already exists"
errors while allowing actual failures to be logged and propagated. This
maintains idempotency while preserving fail-fast behavior and enabling clear
error diagnostics as required by the helm-prereqs guidelines.
Source: Coding guidelines
Problem
helm-prereqs/setup.shcreates the Temporal namespaces (cloud,site,flow,<site-uuid>) with:No
--retentionis passed, and2>/dev/null || trueswallows any failure. Depending on the Temporal server's default-retention policy the create can be rejected/incomplete; the namespace then never exists and the nico-rest cloud/site/flow workers CrashLoop withNamespace "<ns>" is not found. The silent|| truemakes this invisible during install.Fix
Pass an explicit
--retention 72hon each namespace create so it succeeds deterministically regardless of server defaults. (Reproduced on a clean v0.10.3 install; the same code is still present onmain.)🤖 Generated with Claude Code