diff --git a/README.md b/README.md index 2c523b6a..a0a656aa 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ While developing Talm, we aimed to achieve the following goals: - **GitOps Friendly**: The patches generated do not contain sensitive data, allowing them to be stored in Git in an unencrypted, open format. For scenarios requiring complete configurations, the `--full` option allows the obtain a complete config that can be used for matchbox and other solutions. -- **Simplicity of Use**: You no longer need to pass connection options for each specific server; they are saved along with the templating results into a separate file. This allows you to easily apply one or multiple files in batch using a syntax similar to `kubectl apply -f node1.yaml -f node2.yaml`. +- **Simplicity of Use**: You no longer need to pass connection options for each specific server; they are saved along with the templating results into a separate file. Per-node configuration sits in a single modeline-annotated `nodes/.yaml`; the first `-f` file is the anchor and any subsequent `-f` files stack onto its rendered config as side-patches (see "Apply with side-patches" below). - **Compatibility with talosctl**: We strive to maintain compatibility with the upstream project in patches and configurations. The configurations you obtain can be used with the official tools like talosctl and Omni. @@ -176,9 +176,9 @@ cluster: > > 2. **Pre-apply drift preview** (`--skip-drift-preview` opt-out, default on). Reads the node's current MachineConfig via COSI and prints a `+`/`-`/`~`/`=` diff of what's about to change, keyed by `(kind, name)`. Informational only — never blocks. The `-` lines are the most useful: they surface stale documents from a previous apply that the new render no longer emits (e.g. an `eth1` LinkConfig lingering after a migration to `eth0`). Reading the current config requires the auth path — `MachineConfig` is a Sensitive COSI resource and is unreachable on the `--insecure` maintenance connection; the gate prints `drift verification unavailable on maintenance connection` (per-node-prefixed on multi-node insecure apply) and proceeds in that case. Secret-bearing field values (`cluster.token`, `cluster.{ca,aggregatorCA,serviceAccount,etcd.ca}.key`, `machine.token` / `machine.ca.key`, the `cluster.acceptedCAs` / `machine.acceptedCAs` slices, `WireguardConfig.privateKey`, the `peers` slice carrying `presharedKey`s) are redacted by default — both sides render as `***redacted (len=N)***` so a rotation surfaces as different-length sentinels without leaking the value. Pass `--show-secrets-in-drift` to see the raw values verbatim (debugging only — disables the redaction for the run). **`--dry-run` runs this gate** — the diff is read-only and "show me what would change" is exactly the dry-run contract. > -> 3. **Post-apply state verification** (`--skip-post-apply-verify` opt-out, **default off** until the Talos-mutated-field allowlist lands — see [#172](https://github.com/cozystack/talm/issues/172)). After `ApplyConfiguration` returns success, re-reads the on-node MachineConfig and structurally compares it against the bytes that were sent. Divergence blocks the apply chain with a per-document diff, primarily catching silent doc drops (Talos parser ignored an unknown field) and controller reverts. Disabled by default because Talos mutates a handful of leaf fields post-apply (cert hashes, timestamps) that would surface as false-positive divergence without an allowlist. The verify runs only on `--mode=no-reboot`. `--mode=staged`, `--mode=try`, `--mode=reboot`, and `--mode=auto` all skip the gate — each for a documented reason: staged stores rather than activates; try auto-rolls back; reboot kills the COSI connection mid-verify; auto is promoted by Talos to REBOOT internally when the change requires it, so the verify would race the reboot. `--dry-run` skips it too. +> 3. **Post-apply state verification** (`--skip-post-apply-verify` opt-out, **default off** pending a Talos-mutated-field allowlist). After `ApplyConfiguration` returns success, re-reads the on-node MachineConfig and structurally compares it against the bytes that were sent. Divergence blocks the apply chain with a per-document diff, primarily catching silent doc drops (Talos parser ignored an unknown field) and controller reverts. Disabled by default because Talos mutates a handful of leaf fields post-apply (cert hashes, timestamps) that would surface as false-positive divergence without an allowlist. The verify runs only on `--mode=no-reboot`. `--mode=staged`, `--mode=try`, `--mode=reboot`, and `--mode=auto` all skip the gate — each for a documented reason: staged stores rather than activates; try auto-rolls back; reboot kills the COSI connection mid-verify; auto is promoted by Talos to REBOOT internally when the change requires it, so the verify would race the reboot. `--dry-run` skips it too. > -> 4. **Post-upgrade version verify** (`--skip-post-upgrade-verify` opt-out, default on — the gate runs). After `talm upgrade` reports success, waits the configured reconcile window (default 90s; tune via `--post-upgrade-reconcile-window` for slow hardware / large image pulls) for the node to finish booting, then reads `runtime.Version` COSI and compares the running version's `(Major, Minor)` contract against the contract parsed from the target image tag. Point releases share a minor contract; cross-minor mismatch surfaces as a hint-bearing blocker. Catches the silent A/B rollback case where the upgrade RPC acks success but Talos rolled back to the previous partition (cross-vendor image, missing extensions, failed boot readiness check, slow boot exceeding the configured window). Best-effort surrender on digest-pinned images and unparseable tags. See [#175](https://github.com/cozystack/talm/issues/175) for the reproduction. +> 4. **Post-upgrade version verify** (`--skip-post-upgrade-verify` opt-out, default on — the gate runs). After `talm upgrade` reports success, waits the configured reconcile window (default 90s; tune via `--post-upgrade-reconcile-window` for slow hardware / large image pulls) for the node to finish booting, then reads `runtime.Version` COSI and compares the running version's `(Major, Minor)` contract against the contract parsed from the target image tag. Point releases share a minor contract; cross-minor mismatch surfaces as a hint-bearing blocker. Catches the silent A/B rollback case where the upgrade RPC acks success but Talos rolled back to the previous partition (cross-vendor image, missing extensions, failed boot readiness check, slow boot exceeding the configured window). Best-effort surrender on digest-pinned images and unparseable tags. > > The skip flags don't suppress each other — pass them independently. On the `--insecure` (maintenance) path the gates are functionally unreachable for charts that drive discovery via `lookup` — those COSI lookups require an authenticated connection and the render itself errors before any gate runs. Charts that render fully offline (no `lookup` calls) reach the gates on `--insecure` as well, with the Phase 2 hooks degrading gracefully because the `MachineConfig` resource is Sensitive. @@ -192,6 +192,8 @@ Upgrade node: talm upgrade -f nodes/node1.yaml ``` +`talm upgrade` resolves the target installer image from `values.yaml::image` (the cluster-wide knob). To pick the new version, bump `values.yaml::image` and re-run `talm upgrade -f nodes/.yaml`; there is no need to re-template the node files first. Pass `--image ` to override per-invocation (e.g. for an experimental installer build); the flag wins over the `values.yaml` lookup. + Show diff: ```bash talm apply -f nodes/node1.yaml --dry-run @@ -214,6 +216,31 @@ talm template -f nodes/node1.yaml -I > > `talm template -f node.yaml` (with or without `-I`) does **not** apply the same overlay: its output is the rendered template plus the modeline and the auto-generated warning, byte-identical to what the template alone would produce. Routing it through the patcher would drop every YAML comment (including the modeline) and re-sort keys, breaking downstream commands that read the file back. Use `apply --dry-run` if you want to preview the exact bytes that will be sent to the node. +## Apply with side-patches + +`talm apply -f` accepts a chain of files. The FIRST `-f` is the **anchor** — it must carry a `# talm: nodes=[…], templates=[…]` modeline and live under a `talm init`'d project (Chart.yaml + secrets.yaml). Any subsequent `-f` files are **side-patches**: they are merged in order on top of the anchor's rendered config, and a single `ApplyConfiguration` is issued per node carrying the composed result. + +```bash +# Single node file (anchor only): +talm apply -f nodes/cp01.yaml + +# Side-patch stacked on top — useful for one-shot overlays +# (debug kubelet flags, temporary cert SANs, mode=staged drills): +talm apply -f nodes/cp01.yaml -f /tmp/debug-kubelet.yaml +``` + +Side-patches do not need to live under the project root; the first file anchors detection. Reversing the order is an error — the first file must be the rooted anchor. + +If you were relying on the previous "apply N independent node files" shape (`talm apply -f n1.yaml -f n2.yaml -f n3.yaml` triggering three separate applies), invoke `talm apply` once per node file instead: + +```bash +for n in nodes/*.yaml; do talm apply -f "$n"; done +``` + +The current shape rejects this misuse loudly: if any of the subsequent `-f` files carries its own `# talm: …` modeline, the apply errors out before any RPC fires with a hint pointing at the shell-loop pattern above. Stripping the modeline turns a file into a legitimate side-patch. + +Side-patches with a non-empty body are restricted to **single-node anchors**. The same body cannot be distinguished from per-node fields (hostname, address, VIP) versus cluster-wide knobs (NTP servers, KubeProxy mode) by static inspection, and stamping per-node fields across N machines is the original foot-gun the per-node-body guard was designed to prevent. If your anchor's `nodes=[…]` lists more than one target and your side-patch is non-empty, talm rejects the apply early with a hint pointing at the per-file shell loop. For cluster-wide overlays on multi-node anchors, fold the overlay into `values.yaml` or templates rather than passing it as a side-patch; for per-node overrides, generate per-node files via `talm template -I` and feed them into the per-file shell loop. + ## Using talosctl commands Talm offers a similar set of commands to those provided by talosctl. However, you can specify the --file option for them. @@ -269,6 +296,22 @@ Querying disks map example: \- will return the system disk device name +### `--set` vs `--set-string` for IP / version literals + +Helm's `--set` parser interprets dots in the right-hand side as YAML key nesting: + +```bash +talm template --set endpoint=10.0.0.1 # parsed as: endpoint: {10: {0: {0: 1}}} +``` + +For IP addresses, CIDR blocks, or version strings, use `--set-string` to keep the value verbatim: + +```bash +talm template --set-string endpoint=10.0.0.1 # parsed as: endpoint: "10.0.0.1" +``` + +`talm` warns on stderr when it detects an IP-, CIDR-, or version-shaped value in `--set` and points at `--set-string` as the fix. The warning is non-fatal — rendering proceeds with the (likely-broken) nested map so existing automation does not break. For values containing characters Helm's strvals treats specially (e.g. `=`, `,` inside the value, or content that should be opaque to all parsing), use `--set-literal` — it stores the entire RHS as a verbatim string without any escape interpretation. + ## Encryption Talm provides built-in encryption support using [age](https://age-encryption.org/) encryption. Sensitive files are encrypted with their values stored in SOPS format (`ENC[AGE,data:...]`), while YAML keys remain unencrypted for better readability. diff --git a/docs/apply-safety-gates-test-plan.md b/docs/apply-safety-gates-test-plan.md index 0e18ab4a..e3b77388 100644 --- a/docs/apply-safety-gates-test-plan.md +++ b/docs/apply-safety-gates-test-plan.md @@ -1,6 +1,6 @@ # Apply-time safety gates: test plan -A reference checklist for validating changes to the apply-time safety gates introduced in #172 / PR #173. Covers the contract tests that ship with the package plus the manual real-Talos validation steps that surface issues unit tests cannot. +A reference checklist for validating changes to the apply-time safety gates. Covers the contract tests that ship with the package plus the manual real-Talos validation steps that surface issues unit tests cannot. ## Build under test @@ -249,6 +249,6 @@ go vet ./... ## Known limitations / follow-ups -- **Talos-mutated-field allowlist** (open in #172): Phase 2B reports cert hashes / timestamps as divergence today; the verify is off by default until an allowlist lands. +- **Talos-mutated-field allowlist**: Phase 2B reports cert hashes / timestamps as divergence today; the verify is off by default until an allowlist lands. - **`talm upgrade` has no pre-upgrade gates** (Phase 2C runs *after*, not before): the upgrade flow wraps `talosctl upgrade` and doesn't route through `buildApplyClosure` / `applyOneFileDirectPatchMode`, so Phase 1 / Phase 2A do not run. Phase 2C (post-upgrade version verify) was added precisely to catch the silent-rollback class without that refactor. Full pre-upgrade gates would require reproducing the gate calls in `upgrade_handler.go` or refactoring the apply flow. - **Phase 1/2 on `--insecure`**: the safety gates can't run before the chart renders, and the chart's `lookup` calls need an authenticated COSI connection. Insecure path = effectively no gates today. diff --git a/docs/manual-test-plan.md b/docs/manual-test-plan.md index 7908aefd..81621dee 100644 --- a/docs/manual-test-plan.md +++ b/docs/manual-test-plan.md @@ -98,6 +98,23 @@ Expected: clear error mentioning the missing key path. rm -rf /tmp/talm-init-test ``` +### A8. `init --update` without `--preset` and without preset dep in Chart.yaml + +```bash +mkdir -p /tmp/talm-update-test && cd /tmp/talm-update-test +cat > Chart.yaml <&1 | grep "^hint:"`) MUST still surface the inner `add a preset chart` hint after the unwrap. A regression that drops the hint along with the rewrap is a UX regression. + **Regression anchor**: A6's error must reference the missing-key path explicitly. A bare `read key file: open ...: no such file or directory` with no follow-up hint is a UX regression — the operator should see a clear recovery path (`run \`talm init\` to generate a new key, or restore your backup`). ## B. Render / template @@ -137,7 +154,9 @@ diff /tmp/inplace-before.yaml nodes/node0.yaml cp /tmp/inplace-before.yaml nodes/node0.yaml # restore ``` -Expected: `Updated.` on stdout. The diff shows that the rendered body replaces the previous contents — **including any user-added comments**. This is by design (`-I` is rewrite, not merge), but operators often trip on it; note in your test report. +Expected: `Updated.` on stdout. The rendered body replaces the previous body of the file — but **operator-authored comments above the modeline are preserved verbatim**. Comments embedded in the YAML body still get overwritten, since `-I` re-renders the body and Helm has no way to round-trip user-edits made there. + +Regression anchor: write `nodes/node0.yaml` as `# Operator note A\n# Operator note B\n# talm: ...\n`. After `template -I -f nodes/node0.yaml`, the first two lines (`# Operator note A`, `# Operator note B`) MUST still be there, followed by the modeline, the talm-rendered warning header, then the body. Re-run idempotent: a second `template -I` keeps the same prefix structure — leading comments don't drift, multiply, or disappear. ### B5. Render with stale chart preset @@ -205,6 +224,77 @@ Expected: same drift preview as C5, but the secret paths render verbatim — no Regression anchor: `--show-secrets-in-drift` is operator opt-in, never default. Verify by running `talm apply --help` and confirming the flag default is `false`. +### C6a. Apply chain — first `-f` anchors project root, later `-f` files stack as side-patches + +```bash +cat > /tmp/side-ntp.yaml <<'EOF' +machine: + time: + servers: + - 2.example.test +EOF +/tmp/talm-safety apply --dry-run \ + -f nodes/node0.yaml -f /tmp/side-ntp.yaml +``` + +Expected: progress line includes `side-patches=[/tmp/side-ntp.yaml]` alongside `nodes=[…]` and `endpoints=[…]`. Drift preview shows both the anchor's rendered config diff AND the side-patch's `machine.time.servers: added [2.example.test]` mutation. The side-patch is stacked via `engine.MergeFileAsPatch`; last writer wins on overlapping keys. + +Regression anchor: `/tmp/side-ntp.yaml` lives outside the project root. The first `-f` file (`nodes/node0.yaml`) is the SOLE anchor for root detection; later files don't get root-validated. Verify by reversing order — `apply --dry-run -f /tmp/side-ntp.yaml -f nodes/node0.yaml` should fail because the first file has no project root. + +### C6b. Non-modelined anchor with side-patches rejected with reorder hint + +```bash +cat > nodes/orphan.yaml <<'EOF' +machine: {} +EOF +/tmp/talm-safety apply --dry-run -f nodes/orphan.yaml -f /tmp/side-ntp.yaml +``` + +Expected: error `first -f file nodes/orphan.yaml lacks a modeline; side-patches require a modelined anchor` with hint that the first `-f` file must carry `# talm: nodes=[…], templates=[…]` so talm knows what to render before stacking. Note: the orphan file lives inside the project tree because the first `-f` file must have a Chart.yaml + secrets.yaml ancestor for root detection — see C6b-pre below for the root-detection failure mode. + +### C6b-pre. Anchor outside any project root is rejected at root detection + +```bash +cat > /tmp/orphan.yaml <<'EOF' +machine: {} +EOF +/tmp/talm-safety apply --dry-run -f /tmp/orphan.yaml +``` + +Expected: error `failed to detect project root for first file /tmp/orphan.yaml (Chart.yaml and secrets.yaml not found)` with hint `the first -f file anchors the project root; place it inside a talm init'd project, or reorder the -f chain so a rooted file comes first`. This gate fires BEFORE the modeline-vs-orphan dispatch, so any orphan placed outside a project root can never reach the direct-patch path. + +### C6c. Orphan file inside project root dispatches to direct-patch when context provides nodes + +```bash +# Build a talosconfig with explicit nodes field +cp talosconfig /tmp/talosconfig-with-nodes +# (edit to add nodes: [...] under contexts..nodes — YAML or yq) +/tmp/talm-safety apply --dry-run \ + --talosconfig /tmp/talosconfig-with-nodes \ + -f nodes/orphan.yaml +``` + +Expected: progress line `- talm: file=nodes/orphan.yaml, nodes=[…], endpoints=[…]` — targets resolved from the talosconfig context, no early "no nodes available" rejection. The dispatch reaches the direct-patch path (chart-render-with-orphan-as-patch). Note: the orphan must live inside the project tree to clear root detection (see C6b-pre). + +### C6d. Orphan file with no source for nodes fails late with three-way hint + +```bash +# Same talosconfig but WITHOUT a nodes: field in the active context +/tmp/talm-safety apply --dry-run \ + --talosconfig /tmp/talosconfig-no-nodes \ + -f nodes/orphan.yaml +``` + +Expected: `no nodes to target for nodes/orphan.yaml` with hint `the file lacks a # talm: nodes=[…] modeline, no --nodes flag was passed, and the active talosconfig context carries no nodes; pass --nodes explicitly or supply a modelined node file`. The hint must name all three sources so the operator knows which to populate. + +Regression anchor: the "no nodes" check fires AFTER the talosconfig context fallback runs, not before. A regression that re-introduces the early `len(GlobalArgs.Nodes) == 0` gate would block C6c silently. + +### C6e. Progress lines use comma-joined bracketed lists + +Inspect any apply progress line from C1, C3, C6a, C6c. Expect `nodes=[n1,n2,n3]` and `endpoints=[e1,e2]` — comma-joined, bracket-framed, no `[n1 n2 n3]` Go slice format artifact. + +Regression anchor: the format uses `strings.Join(slice, ",")` with surrounding brackets in the format string; `%s` or `%v` directly on a `[]string` produces the space-separated `[n1 n2 n3]` artifact and is a regression. Verify with a multi-node modeline (`# talm: nodes=["1.2.3.4","5.6.7.8"], ...`) → progress line shows `nodes=[1.2.3.4,5.6.7.8]`. + ### C7. Phase 1 walker rejects malformed net-addr fields before the RPC When a rendered MachineConfig carries a malformed value in any of the new walker-covered fields, Phase 1 must block before the apply RPC fires: @@ -505,6 +595,25 @@ Expected: refuses with `FailedPrecondition: blockdevice "" is in use by dis ## I. Shell completion +### I1a. Per-flag and positional completion + +Install bash completion: + +```bash +/tmp/talm-safety completion bash > /tmp/talm-completion.bash +source /tmp/talm-completion.bash +``` + +Then exercise each target. Each expectation below is a forward-looking check the operator runs interactively (or via `__complete ""` for scripted assertions). + +- `talm init --preset ` → curated list including `cozystack`. Sourced from `pkg/generated/presets.go::AvailablePresets()`; no generic file completion. +- `talm apply --mode ` → `auto`, `no-reboot`, `reboot`, `staged`, `try` (the apply-mode enum). +- `talm apply --file ` → only `nodes/*.yaml` files that carry a valid `# talm: …` modeline. Non-modelined yaml files in the project tree do NOT surface. Same for `talm template --file ` and `talm upgrade --file `. +- `talm template --values ` / `--with-secrets ` → file completion narrowed to `.yaml` / `.yml` extensions (`ShellCompDirectiveFilterFileExt`). +- `talm --nodes ` / `talm --endpoints ` → union of nodes / endpoints declared across every context in the active talosconfig. + +Regression anchor: positional completion (`talm apply ` without a flag) MUST NOT surface anything — apply / template / upgrade declare `cobra.NoArgs` so the positional slot is empty. A regression that wires `ValidArgsFunction` to a populated completer would silently re-introduce a dead-completion code path. + ### I1. Generate completion for each shell ```bash @@ -527,10 +636,24 @@ Expected: every shell prints a script that parses (for bash/zsh syntax-check con /tmp/talm-safety template -f nodes/node0.yaml --set floatingIP=0700 ``` -Expected: with the post-#163 chart, fails fast with `talm: floatingIP "0700" is not a valid IPv4 / IPv6 literal`. Pre-#163 chart silently renders an invalid VIP. +Expected: with an IPv4-validating chart, fails fast with `talm: floatingIP "0700" is not a valid IPv4 / IPv6 literal`. A chart without the validation silently renders an invalid VIP. **Operator footgun**: `--set floatingIP=198.51.100.1` *may* be parsed as the float `198.51 × 100.1` by Helm's loose type-coercion. For IPs use `--set-string floatingIP="198.51.100.1"` or put it in `values.yaml`. +`--set` emits a non-fatal warning to stderr when the RHS matches an IP-shaped or semver-shaped literal: + +```bash +/tmp/talm-safety template -f nodes/node0.yaml --set endpoint=10.0.0.1 2>&1 | grep "^talm:" +# Expected: warning recommending --set-string for IP / version literals. +/tmp/talm-safety template -f nodes/node0.yaml --set-string endpoint=10.0.0.1 2>&1 | grep "^talm:" +# Expected: no warning emitted — operator opt-in to verbatim string. +``` + +Regression anchors: + +- Warning fires for IPv4 (`10.0.0.1`, `192.0.2.1/24`) and v-prefixed semver (`v1.2.3`, `v1.2`). Bare two-component decimals (`1.5`, `2.0`) and colon-separated literals (`00:11:22:33:44:55`) do NOT trigger — those aren't strvals-coerced footguns. +- Warning is non-fatal — command still proceeds. A regression that turns it into a hard error would block legitimate `--set` flows. + ### J0-2. `--set-literal` keeps dotted keys intact ```bash @@ -592,12 +715,12 @@ Expected: `warning: pre-flight: configured talosVersion=v1.13 is newer than the ### K1-pre. Phase 2C version-verify catches silent rollback -⚠️ Same destructive setup as K2, but the gate now does the work automatically. **Heads-up**: the target image lives in the node body (`nodes/.yaml`'s `machine.install.image`), not in `values.yaml` — talm's upgrade wrapper reads it from the rendered config patch, not the chart values overlay (see #176). +⚠️ Same destructive setup as K2, but the gate now does the work automatically. **Heads-up**: the upgrade wrapper resolves the target image from `values.yaml::image` (rendering the chart against current values), not from the rendered config that lives in `nodes/.yaml`. Bump the image in `values.yaml` to trigger an intentionally-bad cross-vendor upgrade. Run an intentionally-bad cross-vendor upgrade and expect a hint-bearing blocker: ```bash -sed -i 's|cozystack/cozystack/talos:v1.12.6|siderolabs/installer:v1.13.0|' nodes/node0.yaml +sed -i 's|cozystack/cozystack/talos:v1.12.6|siderolabs/installer:v1.13.0|' values.yaml talm upgrade -f nodes/node0.yaml ``` @@ -674,7 +797,7 @@ Expected: no version-mismatch warning (chart contract matches running). Drift pr ### K5. Phase 2B on a heterogeneous cluster (mid-rollout) -Between K2-step-1 (node0 upgraded) and K2-step-2 (node1 still on old version), Phase 1 still resolves `lookup "links"` (non-Sensitive COSI resource works on both versions). Phase 2A diffs against the specific node, so the cross-version state is per-node, not cluster-wide. Phase 2B (if enabled) compares against the bytes sent; expect cert-hash false-positives until the allowlist lands (see open question in #172). +Between K2-step-1 (node0 upgraded) and K2-step-2 (node1 still on old version), Phase 1 still resolves `lookup "links"` (non-Sensitive COSI resource works on both versions). Phase 2A diffs against the specific node, so the cross-version state is per-node, not cluster-wide. Phase 2B (if enabled) compares against the bytes sent; expect cert-hash false-positives until the Talos-mutated-field allowlist lands. ## L. Extended diagnostics + service control @@ -790,7 +913,38 @@ echo "machine: {type: controlplane}" >> /tmp/_bad.yaml rm /tmp/_bad.yaml ``` -Expected: `error parsing JSON array for key nodes` with a hint about the expected syntax. +Expected: `error parsing JSON array for key nodes` with a hint about the expected syntax. The error surfaces the modeline parse step — the file is NOT silently routed to direct-patch mode where the malformed modeline would be invisible. + +### M1a. Orphan file (no modeline) inside project root — classified as orphan, not malformed + +```bash +echo "# operator note above plain YAML" > nodes/_orphan-commented.yaml +echo "machine: {}" >> nodes/_orphan-commented.yaml +/tmp/talm-safety apply --dry-run --nodes $NODE --endpoints $NODE -f nodes/_orphan-commented.yaml +rm nodes/_orphan-commented.yaml +``` + +Expected: dispatch into direct-patch mode — progress line `- talm: file=nodes/_orphan-commented.yaml, nodes=[$NODE], endpoints=[$NODE]` followed by drift preview. NO "parsing modeline" error, NO "non-comment line found before modeline" error. The classifier (`FindAndParseModeline`) must distinguish "no `# talm:` line anywhere" (orphan, → `ErrModelineNotFound`) from "modeline present but parse failed" (malformed). + +Note: the orphan must live inside the project tree (e.g. under `nodes/`) so root detection finds the project's Chart.yaml / secrets.yaml. An orphan placed at `/tmp/` is rejected at root detection before the modeline classifier runs — see C6b-pre for that path. + +### M1b. Misplaced modeline (modeline below YAML) — rejected with a specific hint + +```bash +cat > nodes/_misplaced.yaml <<'EOF' +machine: + type: controlplane +# talm: nodes=["1.2.3.4"], endpoints=["1.2.3.4"], templates=["templates/controlplane.yaml"] +EOF +/tmp/talm-safety apply --dry-run -f nodes/_misplaced.yaml +rm nodes/_misplaced.yaml +``` + +Expected: error `modeline found below non-comment content: first non-comment line was "machine:"` with hint `the # talm: … modeline must precede any YAML content; only #-prefixed comments and blank lines may sit above it`. Distinct from both M1 (malformed value) and M1a (orphan). + +Regression anchor 1: the classifier lookahead must scan the entire file for a `# talm:` line before deciding orphan vs misplaced. A regression that returns `ErrModelineNotFound` on first non-comment encounter (without lookahead) would silently route misplaced-modeline files into direct-patch mode, hiding the operator error. + +Regression anchor 2: the lookahead is column-0-strict. An indented `# talm:` line inside a YAML body (e.g. ` # talm: see docs for nodes/templates wiring`) is operator-authored documentation, NOT a misplaced modeline; it MUST NOT trip the misplaced-modeline error. Verify by writing a node file with ` # talm: comment` indented under a YAML key and confirming the file routes to orphan / direct-patch mode rather than failing on misplaced-modeline. ### M2. Malformed patch (string-where-map) diff --git a/main.go b/main.go index 16b39681..b32edb3f 100644 --- a/main.go +++ b/main.go @@ -114,6 +114,14 @@ func registerRootFlags(cmd *cobra.Command) { cmd.PersistentFlags().StringVar(&commands.GlobalArgs.Cluster, "cluster", "", "Cluster to connect to if a proxy endpoint is used.") cmd.PersistentFlags().BoolVar(&commands.GlobalArgs.SkipVerify, "skip-verify", false, "skip TLS certificate verification (keeps client authentication)") cmd.PersistentFlags().Bool("version", false, "Print the version number of the application") + + // Shell completion for root persistent flags. --nodes / + // --endpoints draw from the in-scope talosconfig contexts. + // --talosconfig is not wired here — talosconfig has no fixed + // extension and cobra's default file completion is already + // the right shape for picking the file by hand. + _ = cmd.RegisterFlagCompletionFunc("nodes", commands.CompleteTalosconfigNodes) + _ = cmd.RegisterFlagCompletionFunc("endpoints", commands.CompleteTalosconfigEndpoints) } func Execute() error { @@ -128,8 +136,8 @@ func Execute() error { } errorString := err.Error() - //nolint:godox // tracked as #153-followup; cobra validation returns plain fmt.Errorf without a typed error, requires substring matching until cobra ships sentinel errors. - // FIXME(#153-followup): cobra arg/flag validation returns plain + //nolint:godox // cobra validation returns plain fmt.Errorf without a typed error; substring matching is the only way to distinguish those from talm's own errors until cobra ships sentinel errors. + // FIXME: cobra arg/flag validation returns plain // fmt.Errorf without a typed error; substring-matching the // rendered message is the only way to distinguish those from // our own errors today. Track a refactor to wrap cobra diff --git a/pkg/applycheck/diff.go b/pkg/applycheck/diff.go index d72691ac..cd007b0f 100644 --- a/pkg/applycheck/diff.go +++ b/pkg/applycheck/diff.go @@ -101,7 +101,7 @@ type Change struct { // with different key ordering or whitespace produces the same Diff // output. Talos-mutated leaf fields (cert hashes etc.) currently land // as OpUpdate entries with FieldChange path pinpointing what differs; -// an allowlist will be layered on later (see open question in #172). +// an allowlist for Talos-mutated leaf fields will be layered on later. func Diff(current, desired []byte) ([]Change, error) { currentDocs, err := parseDocs(current) if err != nil { diff --git a/pkg/commands/apply.go b/pkg/commands/apply.go index 8030bb71..77d5f174 100644 --- a/pkg/commands/apply.go +++ b/pkg/commands/apply.go @@ -25,6 +25,7 @@ import ( "github.com/cockroachdb/errors" "github.com/cozystack/talm/pkg/engine" + "github.com/cozystack/talm/pkg/modeline" "github.com/spf13/cobra" "google.golang.org/grpc/metadata" "google.golang.org/protobuf/types/known/durationpb" @@ -46,6 +47,16 @@ const parentDir = ".." // share a single canonical value. const applyCommandName = "talm apply" +// roleAnchor and roleSidePatch label the role of a config file in +// the apply chain for operator-facing error messages +// (rejectMultiNodeOverlayFiles). Hoisted to constants so goconst +// stays clean across the applyCmd.Long block, which mentions the +// "anchor" concept several times in operator-facing prose. +const ( + roleAnchor = "anchor" + roleSidePatch = "side-patch" +) + //nolint:gochecknoglobals // cobra command flag struct, idiomatic for cobra-based CLIs var applyCmdFlags struct { helpers.Mode @@ -74,8 +85,28 @@ var applyCmdFlags struct { var applyCmd = &cobra.Command{ Use: "apply", Short: "Apply config to a Talos node", - Long: ``, - Args: cobra.NoArgs, + Long: `Apply rendered configuration to a Talos node. + +Multi-file invocation (anchor + side-patches): the FIRST -f file is the +anchor — it must carry a "# talm: nodes=[…], templates=[…]" modeline and +live under a project root (Chart.yaml + secrets.yaml). Subsequent -f +files are side-patches stacked on top of the anchor's rendered config in +the order they appear; each is merged via the same overlay mechanism the +anchor's node body uses. A single ApplyConfiguration is issued per node +with the composed result. + +Examples: + + # Single node file (anchor only): + talm apply -f nodes/cp01.yaml + + # Stacked side-patches (e.g. one-shot debug overlay): + talm apply -f nodes/cp01.yaml -f /tmp/debug-kubelet.yaml + + # Side-patches do NOT need to live under the project root; the + # first file anchors detection. Reversing the order is an error + # (the orphan path has no project to anchor on).`, + Args: cobra.NoArgs, PreRunE: func(cmd *cobra.Command, _ []string) error { if !cmd.Flags().Changed("talos-version") { applyCmdFlags.talosVersion = Config.TemplateOptions.TalosVersion @@ -132,16 +163,20 @@ func apply() error { return err } - for _, configFile := range expandedFiles { - err = applyOneFile(configFile) - if err != nil { - return err - } - - resetGlobalArgsBetweenFiles(applyCmdFlags.nodesFromArgs, applyCmdFlags.endpointsFromArgs) + // Chain semantics: the first -f file anchors the project root + // and any subsequent files are side-patches stacked on top of + // the first file's rendered config; the result is a single + // ApplyConfiguration per node carrying the composed bundle. + // Talos replaces the whole MachineConfig per call, so per-file + // independent applies (as an earlier design did) would have the + // second apply overwrite everything the first wrote. Multiple + // nodes are still targetable from one invocation via the first + // file's modeline carrying `nodes=[…]`. + if len(expandedFiles) == 0 { + return nil } - return nil + return applyOneFile(expandedFiles[0], expandedFiles[1:]) } // resetGlobalArgsBetweenFiles wipes the per-file GlobalArgs.Nodes / @@ -176,50 +211,233 @@ func resetGlobalArgsBetweenFiles(nodesFromArgs, endpointsFromArgs bool) { } } -// applyOneFile dispatches a single config file through either the -// template-rendering path (when its modeline references templates) -// or the direct-patch path (otherwise). Splitting the per-file work -// out of the outer apply loop keeps the cognitive complexity of -// either half within the linter's gate. -func applyOneFile(configFile string) error { - modelineTemplates, err := processModelineAndUpdateGlobals(configFile, applyCmdFlags.nodesFromArgs, applyCmdFlags.endpointsFromArgs, true) - if err != nil { +// applyOneFile dispatches the apply pipeline for the anchor +// configFile (file[0] of the -f chain) and an optional ordered list +// of sidePatches (file[1:]). The anchor MUST be modelined when +// sidePatches are present — side-patches are stacked on top of the +// rendered config, which requires the chart templates from the +// anchor's modeline. Single-file invocations (no side-patches) +// dispatch by file shape: modelined → template path, non-modelined +// → direct-patch path. +// +// Stack semantics: the anchor's chart + body + each sidePatch in +// order produces ONE composed config per node; a single +// ApplyConfiguration call is issued per node. Any design that +// iterated each file as an independent apply would have the second +// apply overwrite everything the first wrote, since Talos replaces +// the whole MachineConfig per call. +func applyOneFile(configFile string, sidePatches []string) error { + // Reject modelined files in the side-patch slots before any + // dispatch decisions. The apply chain treats file[0] as the + // anchor and file[1:] as bytes-level patches stacked onto the + // anchor's render — if a side-patch carries its own modeline, + // its nodes/templates/endpoints are silently ignored because + // MergeFileAsPatch treats `# talm: …` as a YAML comment. + // Operators expecting per-file independent applies (the + // kubectl-style multi-node shape) would silently target only + // the first file's nodes; gate that here so the misuse is + // loud, with a hint pointing at the correct per-file loop. + if err := rejectModelinedSidePatches(sidePatches); err != nil { return err } + + hasModeline, modelineErr := fileHasTalmModeline(configFile) + // Malformed modeline (typo in keys, bad JSON value, non-comment + // line before the modeline) MUST surface to the operator — + // silently routing onto the direct-patch path here would hide + // the real cause and produce a misleading "no nodes" hint later. + // Only the "no `# talm:` line in file" case (ErrModelineNotFound) + // is allowed to fall through as a side-patch-shaped input. + if modelineErr != nil { + return errors.Wrapf(modelineErr, "parsing modeline in %s", configFile) + } + // Resolve secrets.yaml path relative to project root if not absolute withSecretsPath := ResolveSecretsPath(applyCmdFlags.withSecrets) + if !hasModeline { + if len(sidePatches) > 0 { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return errors.WithHint( + errors.Newf("first -f file %s lacks a modeline; side-patches require a modelined anchor", configFile), + "the first -f file in a chain must carry a `# talm: nodes=[…], templates=[…]` modeline so talm knows what to render before stacking side-patches on top", + ) + } + + // Targets may come from --nodes OR from the talosconfig + // context (resolved inside applyOneFileDirectPatchMode via + // client.GetConfigContext().Nodes). Defer the "no nodes + // anywhere" check to the apply path itself so the + // talosconfig-default flow stays reachable. + return applyOneFileDirectPatchMode(configFile, withSecretsPath) + } + + // Modelined anchor: parse modeline, populate GlobalArgs, then + // dispatch into template-mode (chart render + body + side-patches) + // or direct-patch-mode (no templates declared). + resetGlobalArgsBetweenFiles(applyCmdFlags.nodesFromArgs, applyCmdFlags.endpointsFromArgs) + + modelineTemplates, err := processModelineAndUpdateGlobals(configFile, applyCmdFlags.nodesFromArgs, applyCmdFlags.endpointsFromArgs, true) + if err != nil { + return err + } + if len(modelineTemplates) > 0 { - return applyOneFileTemplateMode(configFile, modelineTemplates, withSecretsPath) + return applyOneFileTemplateMode(configFile, sidePatches, modelineTemplates, withSecretsPath) + } + + if len(sidePatches) > 0 { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return errors.WithHint( + errors.Newf("anchor %s declares no templates; side-patches require chart rendering", configFile), + "add `templates=[\"templates/.yaml\"]` to the first file's modeline so the chain has a base config to stack onto", + ) } return applyOneFileDirectPatchMode(configFile, withSecretsPath) } +// fileHasTalmModeline classifies configFile into one of three +// shapes by attempting to parse its modeline: +// +// - (true, nil) — file has a well-formed `# talm: …` modeline +// - (false, nil) — file has no `# talm:` line at all (the +// side-patch input shape — caller routes to direct-patch) +// - (false, err) — file has a modeline candidate but parsing it +// failed (typo in keys, bad JSON value, non-comment YAML +// before the modeline). The caller surfaces err to the +// operator so the real cause is visible. +// +// Distinguishing "no modeline" from "malformed modeline" is the +// reason this helper exists. Returning a bare bool would conflate +// the two, silently routing typoed modelines onto the direct-patch +// path where the operator would see a misleading "no nodes" hint +// instead of "parsing modeline: invalid JSON". +func fileHasTalmModeline(configFile string) (bool, error) { + _, _, err := modeline.FindAndParseModeline(configFile) + if err == nil { + return true, nil + } + + if errors.Is(err, modeline.ErrModelineNotFound) { + return false, nil + } + + // The caller (applyOneFile) wraps this with the configFile + // path; FindAndParseModeline already attaches WithHint at its + // boundary, so wrapping again here would double-message. + return false, err //nolint:wrapcheck // forwarded verbatim; caller wraps with configFile context +} + +// rejectMultiNodeOverlayFiles scans the anchor and every side-patch +// for a per-node body overlay (hostname, address, VIP, machine +// identifier) when the apply targets more than one node. The +// anchor's overlay check is the original guard against an +// operator pointing one node file at N nodes; the side-patch +// check is the same guard extended to the chain. A side-patch is +// stamped identically onto every node in the chain, so a per-node +// field inside it produces the same N-machines-same-hostname +// footgun the anchor check was added to prevent. +func rejectMultiNodeOverlayFiles(configFile string, sidePatches, nodes []string) error { + candidates := append([]string{configFile}, sidePatches...) + for _, path := range candidates { + hasOverlay, err := engine.NodeFileHasOverlay(path) + if err != nil { + return errors.Wrapf(err, "checking %q for per-node body overlay", path) + } + + if !hasOverlay { + continue + } + + role := roleAnchor + if path != configFile { + role = roleSidePatch + } + //nolint:wrapcheck // sentinel constructed in-place; WithHintf attaches operator guidance + return errors.WithHintf( + errors.Newf("%s %q would be stamped onto %d nodes (%v) but carries a non-empty per-node body", role, path, len(nodes), nodes), + "split %q into one file per node, or remove the per-node fields if you want the file applied uniformly across each", + path, + ) + } + + return nil +} + +// rejectModelinedSidePatches scans the side-patch slots of an +// apply chain (file[1:] of `talm apply -f anchor -f sideA -f sideB`) +// and rejects the chain when any side-patch carries its own +// modeline. Operators familiar with `kubectl apply -f *.yaml` +// often pass multiple modelined node files expecting independent +// applies — but the new chain semantics treat file[1:] as +// bytes-level patches stacked onto the anchor's render. A side- +// patch's `# talm: …` line is just a YAML comment to +// MergeFileAsPatch, so its nodes/templates/endpoints would be +// silently dropped while its body got merged into the anchor's +// targets. This guard surfaces that misuse loudly with a hint +// pointing at the per-file shell loop, which is the correct shape +// for the multi-node-apply pattern. +// +// A malformed modeline in a side-patch slot also surfaces — the +// operator should fix the typo regardless of whether they intended +// the file as anchor or side-patch. +func rejectModelinedSidePatches(sidePatches []string) error { + for _, path := range sidePatches { + hasModeline, modelineErr := fileHasTalmModeline(path) + if modelineErr != nil { + return errors.Wrapf(modelineErr, "parsing modeline in side-patch %s", path) + } + + if hasModeline { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return errors.WithHint( + errors.Newf("side-patch %s carries its own modeline; the apply chain treats only the first -f file as the anchor and stacks subsequent files as side-patches", path), + "to apply multiple modelined node files independently, run one apply per file (e.g. `for f in nodes/*.yaml; do talm apply -f $f; done`); to use this file as a side-patch on the current anchor, strip its `# talm: …` line", + ) + } + } + + return nil +} + // applyOneFileTemplateMode runs the template-rendering apply path for -// a single configFile: renders templates per node, overlays the node -// file body, and ApplyConfigurations the result. See -// applyTemplatesPerNode for why the loop is mandatory. -func applyOneFileTemplateMode(configFile string, modelineTemplates []string, withSecretsPath string) error { +// a single rooted configFile and an optional chain of sidePatches. +// Per node: renders templates, overlays the node body, then stacks +// every sidePatch in order via engine.MergeFileAsPatch — a single +// ApplyConfiguration is issued per node with the final composed +// config. Empty sidePatches reproduces the single-file shape. See +// applyTemplatesPerNode for the per-node loop. +func applyOneFileTemplateMode(configFile string, sidePatches, modelineTemplates []string, withSecretsPath string) error { opts := buildApplyRenderOptions(modelineTemplates, withSecretsPath) nodes := append([]string(nil), GlobalArgs.Nodes...) - //nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator - fmt.Printf("- talm: file=%s, nodes=%s, endpoints=%s\n", configFile, nodes, GlobalArgs.Endpoints) + // Elide the `side-patches=` segment in the single-file case so + // the conventional path's progress line stays uncluttered. The + // dominant invocation shape is `talm apply -f nodes/.yaml` + // without side-patches; reporting `side-patches=[]` on every line + // was visible noise without operator value. + if len(sidePatches) == 0 { + //nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator + fmt.Printf("- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ",")) + } else { + //nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator + fmt.Printf("- talm: file=%s, side-patches=[%s], nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(sidePatches, ","), strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ",")) + } applyClosure := buildApplyClosure() if applyCmdFlags.insecure { openClient := openClientPerNodeMaintenance(applyCmdFlags.certFingerprints, WithClientMaintenance) - return applyTemplatesPerNode(opts, configFile, nodes, openClient, engine.Render, applyClosure) + return applyTemplatesPerNode(opts, configFile, sidePatches, nodes, openClient, engine.Render, applyClosure) } return withApplyClientBare(func(parentCtx context.Context, c *client.Client) error { resolved := resolveAuthTemplateNodes(nodes, c) openClient := openClientPerNodeAuth(parentCtx, c) - return applyTemplatesPerNode(opts, configFile, resolved, openClient, engine.Render, applyClosure) + return applyTemplatesPerNode(opts, configFile, sidePatches, resolved, openClient, engine.Render, applyClosure) }) } @@ -295,6 +513,32 @@ func buildApplyClosure() applyFunc { // slice length — see cosiPreflightContext for the auth path's // workaround that has to scope ctx back to "node" before calling the // same COSI preflight). +// resolveDirectPatchTargetNodes resolves the per-node target list +// for the direct-patch apply path. Order of precedence: --nodes +// (GlobalArgs.Nodes) first, falling back to the active talosconfig +// context's Nodes slice. Returns an operator-facing error when +// both sources are empty — the direct-patch path has no modeline +// to read targets from, so a missing --nodes AND a context with +// no Nodes leaves nowhere to send the apply. +func resolveDirectPatchTargetNodes(c *client.Client, configFile string) ([]string, error) { + targetNodes := append([]string(nil), GlobalArgs.Nodes...) + if len(targetNodes) == 0 { + if cfg := c.GetConfigContext(); cfg != nil { + targetNodes = append(targetNodes, cfg.Nodes...) + } + } + + if len(targetNodes) == 0 { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return nil, errors.WithHint( + errors.Newf("no nodes to target for %s", configFile), + "the file lacks a `# talm: nodes=[…]` modeline, no --nodes flag was passed, and the active talosconfig context carries no nodes; pass --nodes explicitly or supply a modelined node file", + ) + } + + return targetNodes, nil +} + func applyOneFileDirectPatchMode(configFile, withSecretsPath string) error { opts := buildApplyPatchOptions(withSecretsPath) patches := []string{"@" + configFile} @@ -318,19 +562,13 @@ func applyOneFileDirectPatchMode(configFile, withSecretsPath string) error { } return withApplyClient(func(ctx context.Context, c *client.Client) error { - // wrapWithNodeContext fills ctx via client.WithNodes from - // talosconfig when --nodes is omitted, but does not mutate - // GlobalArgs.Nodes. Mirror its resolution here so the log line - // and the per-node preflight loop see the actual targets. - targetNodes := append([]string(nil), GlobalArgs.Nodes...) - if len(targetNodes) == 0 { - if cfg := c.GetConfigContext(); cfg != nil { - targetNodes = append(targetNodes, cfg.Nodes...) - } + targetNodes, err := resolveDirectPatchTargetNodes(c, configFile) + if err != nil { + return err } //nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator - fmt.Printf("- talm: file=%s, nodes=%s, endpoints=%s\n", configFile, targetNodes, GlobalArgs.Endpoints) + fmt.Printf("- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(targetNodes, ","), strings.Join(GlobalArgs.Endpoints, ",")) read := cosiVersionReader(c) @@ -467,9 +705,9 @@ func shouldRunPostApplyVerify(mode machineapi.ApplyConfigurationRequest_Mode, dr // // Cost: AUTO applies that DON'T require a reboot lose their // post-apply verify. Acceptable trade-off: the verify is - // default-off until the Talos-mutated-field allowlist lands - // (see #172), and an operator who needs verify-on-no-reboot - // can pass --mode=no-reboot explicitly. + // default-off pending a Talos-mutated-field allowlist, and + // an operator who needs verify-on-no-reboot can pass + // --mode=no-reboot explicitly. machineapi.ApplyConfigurationRequest_AUTO: return false case machineapi.ApplyConfigurationRequest_NO_REBOOT: @@ -544,6 +782,7 @@ type openClientFunc func(node string, action func(ctx context.Context, c *client func applyTemplatesPerNode( opts engine.Options, configFile string, + sidePatches []string, nodes []string, openClient openClientFunc, render renderFunc, @@ -560,26 +799,20 @@ func applyTemplatesPerNode( // pin. Replaying it across multiple targets would stamp the same // value onto every machine, so reject this combination early and // ask the user to split the file. Modeline-only files (no body) - // are fine — they just carry the target list. + // are fine — they just carry the target list. Side-patches are + // subject to the same rule: they are stacked identically onto + // every node, so a per-node field inside a side-patch is the + // same multi-node-stamp footgun as a per-node field inside the + // anchor. if len(nodes) > 1 { - hasOverlay, err := engine.NodeFileHasOverlay(configFile) - if err != nil { - return errors.Wrapf(err, "checking %q for per-node body overlay", configFile) - } - - if hasOverlay { - //nolint:wrapcheck // sentinel constructed in-place; WithHintf attaches operator guidance - return errors.WithHintf( - errors.Newf("node file %q targets %d nodes (%v) but carries a non-empty per-node body", configFile, len(nodes), nodes), - "split %q into one file per node, or remove the per-node fields if you want the rendered template alone applied to each", - configFile, - ) + if err := rejectMultiNodeOverlayFiles(configFile, sidePatches, nodes); err != nil { + return err } } for _, node := range nodes { err := openClient(node, func(ctx context.Context, c *client.Client) error { - return renderMergeAndApply(ctx, c, opts, configFile, render, apply) + return renderMergeAndApply(ctx, c, opts, configFile, sidePatches, render, apply) }) if err != nil { return errors.Wrapf(err, "node %s", node) @@ -728,10 +961,15 @@ func resolveAuthTemplateNodes(cliNodes []string, c *client.Client) []string { return append([]string(nil), cfg.Nodes...) } -// renderMergeAndApply is the per-node body shared by every apply mode. +// renderMergeAndApply is the per-node body shared by every apply +// mode. sidePatches is the chain of additional -f files stacked on +// top of the rooted configFile. Each is merged in order via +// engine.MergeFileAsPatch; the result is a single composed config +// applied with one ApplyConfiguration call per node. Empty slice +// (single -f file) is a single-merge shape. // //nolint:gocritic // opts taken by value to mirror applyTemplatesPerNode's test-injection signature -func renderMergeAndApply(ctx context.Context, c *client.Client, opts engine.Options, configFile string, render renderFunc, apply applyFunc) error { +func renderMergeAndApply(ctx context.Context, c *client.Client, opts engine.Options, configFile string, sidePatches []string, render renderFunc, apply applyFunc) error { rendered, err := render(ctx, c, opts) if err != nil { //nolint:wrapcheck // already wrapped via errors.Wrap, WithHint adds operator-facing guidance @@ -746,6 +984,13 @@ func renderMergeAndApply(ctx context.Context, c *client.Client, opts engine.Opti return errors.Wrapf(err, "merging node file %q as patch", configFile) } + for _, sidePatch := range sidePatches { + merged, err = engine.MergeFileAsPatch(merged, sidePatch) + if err != nil { + return errors.Wrapf(err, "merging side-patch %q onto rendered config", sidePatch) + } + } + return apply(ctx, c, merged) } @@ -882,7 +1127,7 @@ func resolveTemplatePaths(templates []string, rootDir string) []string { func init() { applyCmd.Flags().BoolVarP(&applyCmdFlags.insecure, "insecure", "i", false, "apply using the insecure (encrypted with no auth) maintenance service") - applyCmd.Flags().StringSliceVarP(&applyCmdFlags.configFiles, "file", "f", nil, "specify config files or patches in a YAML file (can specify multiple)") + applyCmd.Flags().StringSliceVarP(&applyCmdFlags.configFiles, "file", "f", nil, "node config files / patches (`.yaml` / `.yml`; shell completion narrows to these extensions). First -f is the modelined anchor (must live under a `talm init`'d project root); subsequent -f files are side-patches stacked onto the anchor's rendered config and may live anywhere.") applyCmd.Flags().StringVar(&applyCmdFlags.talosVersion, "talos-version", "", "the desired Talos version to generate config for (backwards compatibility, e.g. v0.8)") applyCmd.Flags().StringVar(&applyCmdFlags.withSecrets, "with-secrets", "", "use a secrets file generated using 'gen secrets'") applyCmd.Flags().StringVar(&applyCmdFlags.kubernetesVersion, "kubernetes-version", constants.DefaultKubernetesVersion, "desired kubernetes version to run") @@ -893,9 +1138,20 @@ func init() { applyCmd.Flags().BoolVar(&applyCmdFlags.force, "force", false, "will overwrite existing files") applyCmd.Flags().BoolVar(&applyCmdFlags.skipResourceValidation, "skip-resource-validation", false, "skip the pre-apply check that declared host resources (links, disks) exist on the target node") applyCmd.Flags().BoolVar(&applyCmdFlags.skipDriftPreview, "skip-drift-preview", false, "skip the pre-apply diff of on-node vs rendered MachineConfig") - applyCmd.Flags().BoolVar(&applyCmdFlags.skipPostApplyVerify, "skip-post-apply-verify", true, "skip the post-apply structural verification of on-node vs sent MachineConfig (default skip until the Talos-mutated field allowlist lands; see #172)") + applyCmd.Flags().BoolVar(&applyCmdFlags.skipPostApplyVerify, "skip-post-apply-verify", true, "skip the post-apply structural verification of on-node vs sent MachineConfig (default skip until the Talos-mutated field allowlist lands)") applyCmd.Flags().BoolVar(&applyCmdFlags.showSecretsInDrift, "show-secrets-in-drift", false, "show secret-bearing field values verbatim in drift preview / post-apply verify output (default: redacted; cluster.token, cluster.ca.key, machine.token, Wireguard private keys, etc.)") helpers.AddModeFlags(&applyCmdFlags.Mode, applyCmd) + // Shell completion for `talm apply` flags. `--file` returns the + // modelined-yaml files under /nodes/ via completeNodeFiles + // — these are the canonical anchor inputs. The operator-typed + // `-f ` lands on the curated list rather than every yaml + // in CWD. ValidArgsFunction is NOT wired because applyCmd + // declares cobra.NoArgs, which suppresses positional completion + // in cobra's __complete path; wiring it would pin dead surface. + _ = applyCmd.RegisterFlagCompletionFunc("mode", completeApplyMode) + _ = applyCmd.RegisterFlagCompletionFunc("file", completeNodeFiles) + _ = applyCmd.RegisterFlagCompletionFunc("with-secrets", completeYAMLFiles) + addCommand(applyCmd) } diff --git a/pkg/commands/apply_test.go b/pkg/commands/apply_test.go index 74ef1dbd..997a1307 100644 --- a/pkg/commands/apply_test.go +++ b/pkg/commands/apply_test.go @@ -380,7 +380,7 @@ func TestApplyTemplatesPerNode_LoopsOncePerNodeWithSingleNodeContext(t *testing. return nil } - if err := applyTemplatesPerNode(engine.Options{}, configFile, want, fakeAuthOpenClient(context.Background()), render, apply); err != nil { + if err := applyTemplatesPerNode(engine.Options{}, configFile, nil, want, fakeAuthOpenClient(context.Background()), render, apply); err != nil { t.Fatalf("applyTemplatesPerNode: %v", err) } @@ -425,7 +425,7 @@ func TestApplyTemplatesPerNode_NeverBatchesNodes(t *testing.T) { return nil } - if err := applyTemplatesPerNode(engine.Options{}, configFile, want, fakeAuthOpenClient(context.Background()), render, apply); err != nil { + if err := applyTemplatesPerNode(engine.Options{}, configFile, nil, want, fakeAuthOpenClient(context.Background()), render, apply); err != nil { t.Fatalf("applyTemplatesPerNode: %v", err) } if renderCount != len(want) { @@ -462,20 +462,69 @@ machine: return nil } - err := applyTemplatesPerNode(engine.Options{}, configFile, + err := applyTemplatesPerNode(engine.Options{}, configFile, nil, []string{testNodeAddrA, testNodeAddrB}, fakeAuthOpenClient(context.Background()), render, apply) if err == nil { t.Fatal("expected an error for multi-node + non-empty body, got nil") } msg := err.Error() - for _, want := range []string{"node file", "2 nodes", "per-node body"} { + for _, want := range []string{"anchor", "2 nodes", "per-node body"} { if !strings.Contains(msg, want) { t.Errorf("error message %q does not mention %q", msg, want) } } } +// TestApplyTemplatesPerNode_MultiNodeWithSidePatchOverlayIsRejected +// pins the extension of the per-node-body guard to the side-patch +// slot: a chain like `apply -f modeline-only.yaml -f side.yaml` +// targeting N>1 nodes, where side.yaml carries a per-node field +// (hostname, address, VIP), must be rejected — the side-patch is +// stamped identically onto every node so a per-node field inside +// it produces the same "all N machines named the same hostname" +// footgun the anchor check was added to prevent. +func TestApplyTemplatesPerNode_MultiNodeWithSidePatchOverlayIsRejected(t *testing.T) { + dir := t.TempDir() + + // Anchor is modeline-only (no per-node body) — passes the + // anchor check by itself. + anchor := filepath.Join(dir, "anchor.yaml") + if err := os.WriteFile(anchor, []byte(`# talm: nodes=["10.0.0.1","10.0.0.2"]`+"\n"), 0o644); err != nil { + t.Fatalf("write anchor: %v", err) + } + + // Side-patch carries a per-node field. Pre-extension this + // slipped through the guard and got stamped onto every node. + side := filepath.Join(dir, "side.yaml") + if err := os.WriteFile(side, []byte("machine:\n network:\n hostname: only-valid-for-one-node\n"), 0o644); err != nil { + t.Fatalf("write side: %v", err) + } + + render := func(_ context.Context, _ *client.Client, _ engine.Options) ([]byte, error) { + t.Fatal("render must not be called when the side-patch overlay guard trips") + return nil, nil + } + apply := func(_ context.Context, _ *client.Client, _ []byte) error { + t.Fatal("apply must not be called when the side-patch overlay guard trips") + return nil + } + + err := applyTemplatesPerNode(engine.Options{}, anchor, []string{side}, + []string{testNodeAddrA, testNodeAddrB}, + fakeAuthOpenClient(context.Background()), render, apply) + if err == nil { + t.Fatal("expected error: side-patch with per-node body must be rejected on a multi-node chain") + } + + msg := err.Error() + for _, want := range []string{"side-patch", "2 nodes", "per-node body"} { + if !strings.Contains(msg, want) { + t.Errorf("error %q must mention %q", msg, want) + } + } +} + // TestApplyTemplatesPerNode_MultiNodeEmptyBodyIsAllowed pins that the // overlay guard does NOT trip on a modeline-only node file (the common // bootstrap shape: one file drives the same template across N nodes @@ -498,7 +547,7 @@ func TestApplyTemplatesPerNode_MultiNodeEmptyBodyIsAllowed(t *testing.T) { return nil } - if err := applyTemplatesPerNode(engine.Options{}, configFile, + if err := applyTemplatesPerNode(engine.Options{}, configFile, nil, []string{testNodeAddrA, testNodeAddrB}, fakeAuthOpenClient(context.Background()), render, apply); err != nil { t.Fatalf("applyTemplatesPerNode: %v", err) @@ -527,7 +576,7 @@ func TestApplyTemplatesPerNode_NoNodesIsAnError(t *testing.T) { return nil } - err := applyTemplatesPerNode(engine.Options{}, configFile, nil, fakeAuthOpenClient(context.Background()), render, apply) + err := applyTemplatesPerNode(engine.Options{}, configFile, nil, nil, fakeAuthOpenClient(context.Background()), render, apply) if err == nil { t.Fatal("expected an error for empty nodes list, got nil") } @@ -589,7 +638,7 @@ func TestApplyTemplatesPerNode_MaintenanceModeOpensFreshClientPerNode(t *testing return nil } - if err := applyTemplatesPerNode(engine.Options{}, configFile, want, openClient, render, apply); err != nil { + if err := applyTemplatesPerNode(engine.Options{}, configFile, nil, want, openClient, render, apply); err != nil { t.Fatalf("applyTemplatesPerNode: %v", err) } if !slices.Equal(clientOpenedFor, want) { @@ -717,7 +766,7 @@ func TestApplyTemplatesPerNode_AuthModeUsesPluralNodesMetadataKey(t *testing.T) apply := func(_ context.Context, _ *client.Client, _ []byte) error { return nil } openClient := openClientPerNodeAuth(context.Background(), nil) - if err := applyTemplatesPerNode(engine.Options{}, configFile, []string{node}, openClient, render, apply); err != nil { + if err := applyTemplatesPerNode(engine.Options{}, configFile, nil, []string{node}, openClient, render, apply); err != nil { t.Fatalf("applyTemplatesPerNode: %v", err) } } diff --git a/pkg/commands/completion.go b/pkg/commands/completion.go new file mode 100644 index 00000000..534a3736 --- /dev/null +++ b/pkg/commands/completion.go @@ -0,0 +1,238 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "os" + "path/filepath" + "strings" + + "github.com/cozystack/talm/pkg/generated" + "github.com/cozystack/talm/pkg/modeline" + "github.com/siderolabs/talos/pkg/machinery/client/config" + "github.com/spf13/cobra" +) + +// Field names reused across completion helpers and the shadow-flags +// map in talosctl_wrapper.go. Hoisted to constants to satisfy +// goconst across the package. +const ( + flagNameNodes = "nodes" + flagNameEndpoints = "endpoints" + yamlExt = "yaml" + ymlExt = "yml" + nodesDirName = "nodes" +) + +// applyModeOptions enumerates the apply / patch / edit `--mode` +// values upstream talosctl exposes via helpers.AddModeFlags. Pinned +// here rather than imported because upstream's keys live inside a +// per-call map, not a package-level constant — we cannot reflect +// them at completion time without instantiating the Mode flag. +// +//nolint:gochecknoglobals // immutable lookup table used by completeApplyMode at completion time. +var applyModeOptions = []string{"auto", "no-reboot", "reboot", "staged", "try"} + +// completePresetNames implements shell completion for the `--preset` +// flag of `talm init`: the available presets are baked into the +// binary at build time, so the completion is deterministic and the +// directive disables file-completion fallback. +func completePresetNames(_ *cobra.Command, _ []string, _ string) ([]string, cobra.ShellCompDirective) { + presets, err := generated.AvailablePresets() + if err != nil { + // Surfaces no completions plus the error so the shell + // does not silently fall back to file completion. + return nil, cobra.ShellCompDirectiveError + } + + return presets, cobra.ShellCompDirectiveNoFileComp +} + +// completeApplyMode implements shell completion for the `--mode` +// flag of `talm apply`. Fixed enum, no file fallback. +func completeApplyMode(_ *cobra.Command, _ []string, _ string) ([]string, cobra.ShellCompDirective) { + return applyModeOptions, cobra.ShellCompDirectiveNoFileComp +} + +// completeYAMLFiles implements shell completion for flags that +// accept YAML file paths (`-f / --file`, `--values`, `-t / --template`, +// `--with-secrets`). The directive narrows the file-completion +// fallback to `*.yaml` and `*.yml`. +func completeYAMLFiles(_ *cobra.Command, _ []string, _ string) ([]string, cobra.ShellCompDirective) { + return []string{yamlExt, ymlExt}, cobra.ShellCompDirectiveFilterFileExt +} + +// completeNodeFiles is the ValidArgsFunction for `apply` / `template` +// / `upgrade`'s positional file argument. Walks `nodes/` under the +// detected project root, filters to YAML files whose first line is +// a talm modeline. Operator types `talm upgrade ` and gets the +// list of modelined node files, not every yaml in the project. +func completeNodeFiles(_ *cobra.Command, _ []string, toComplete string) ([]string, cobra.ShellCompDirective) { + root := Config.RootDir + if root == "" { + // Pre-PreRunE invocation (cobra's __complete path may + // fire before our root detection runs). Best-effort + // fallback: walk from CWD. + cwd, err := os.Getwd() + if err != nil { + return nil, cobra.ShellCompDirectiveError + } + + root = cwd + } + + nodesDir := filepath.Join(root, nodesDirName) + + entries, err := os.ReadDir(nodesDir) + if err != nil { + // `nodes/` may not exist on a brand-new project; fall + // back to default file completion so the operator can + // still pick files manually. + return []string{yamlExt, ymlExt}, cobra.ShellCompDirectiveFilterFileExt + } + + var matches []string + + for _, entry := range entries { + if entry.IsDir() { + continue + } + + name := entry.Name() + if !strings.HasSuffix(name, "."+yamlExt) && !strings.HasSuffix(name, "."+ymlExt) { + continue + } + + path := filepath.Join(nodesDirName, name) + if !strings.HasPrefix(path, toComplete) && toComplete != "" { + continue + } + + // FindAndParseModeline accepts leading operator comments; + // using a strict first-line-only parser would hide files + // that `talm template -I` just produced from `talm apply + // ` / `talm upgrade `. Both ErrModelineNotFound + // and malformed-modeline parse errors drop the candidate + // from the completion list — completion must not block on + // individual-file parse failures, and a malformed modeline + // is still not a useful tab target. + if _, _, err := modeline.FindAndParseModeline(filepath.Join(root, path)); err != nil { + continue + } + + matches = append(matches, path) + } + + return matches, cobra.ShellCompDirectiveNoFileComp +} + +// CompleteTalosconfigNodes is the exported entry point for +// completion of the root `--nodes` persistent flag. main.go wires +// it via cobra.RegisterFlagCompletionFunc. +// +//nolint:gochecknoglobals // package-level var holding the closure produced by completeTalosconfigField; the alternative (per-call construction) re-allocates on every shell completion invocation. +var CompleteTalosconfigNodes = completeTalosconfigField(flagNameNodes) + +// CompleteTalosconfigEndpoints is the exported entry point for +// completion of the root `--endpoints` persistent flag. +// +//nolint:gochecknoglobals // see CompleteTalosconfigNodes. +var CompleteTalosconfigEndpoints = completeTalosconfigField(flagNameEndpoints) + +// completeTalosconfigField builds the completion function for the +// `--nodes` / `--endpoints` root persistent flags. Loads the +// in-scope talosconfig and returns the union of the requested field +// across every context. Closes over `field` so the call sites stay +// one-liners. +// +// `field` must be "nodes" or "endpoints"; any other value yields an +// empty completion list (caller error). +func completeTalosconfigField(field string) func(*cobra.Command, []string, string) ([]string, cobra.ShellCompDirective) { + return func(_ *cobra.Command, _ []string, _ string) ([]string, cobra.ShellCompDirective) { + path := resolveTalosconfigPathForCompletion() + if path == "" { + return nil, cobra.ShellCompDirectiveNoFileComp + } + + cfg, err := config.Open(path) + if err != nil { + return nil, cobra.ShellCompDirectiveError + } + + seen := map[string]struct{}{} + out := []string{} + + for _, ctx := range cfg.Contexts { + var values []string + + switch field { + case flagNameNodes: + values = ctx.Nodes + case flagNameEndpoints: + values = ctx.Endpoints + default: + return nil, cobra.ShellCompDirectiveError + } + + for _, item := range values { + if _, dup := seen[item]; dup { + continue + } + + seen[item] = struct{}{} + out = append(out, item) + } + } + + return out, cobra.ShellCompDirectiveNoFileComp + } +} + +// resolveTalosconfigPathForCompletion returns the talosconfig path +// the completion should read. Mirrors the in-process precedence: +// 1. --talosconfig flag (already in GlobalArgs.Talosconfig) +// 2. $TALOSCONFIG env var (cobra parses the flag default into +// GlobalArgs.Talosconfig only if the operator passed --talosconfig +// at invocation; for completion we have to consult the env +// explicitly) +// 3. project-local talosconfig (Config.RootDir/talosconfig) +// +// Returns the empty string if no candidate is readable. Completion +// callers translate empty into an empty completion list rather than +// an error — completion must not block on missing files. +func resolveTalosconfigPathForCompletion() string { + if GlobalArgs.Talosconfig != "" { + if _, err := os.Stat(GlobalArgs.Talosconfig); err == nil { + return GlobalArgs.Talosconfig + } + } + + if env := os.Getenv("TALOSCONFIG"); env != "" { + // $TALOSCONFIG is operator-controlled, not network-tainted; + // the lint is a false positive for env-derived paths. + if _, err := os.Stat(env); err == nil { //nolint:gosec // G304: operator-supplied env, not network-tainted + return env + } + } + + if Config.RootDir != "" { + candidate := filepath.Join(Config.RootDir, "talosconfig") + if _, err := os.Stat(candidate); err == nil { + return candidate + } + } + + return "" +} diff --git a/pkg/commands/completion_test.go b/pkg/commands/completion_test.go new file mode 100644 index 00000000..7dbd7bbe --- /dev/null +++ b/pkg/commands/completion_test.go @@ -0,0 +1,325 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "os" + "path/filepath" + "slices" + "testing" + + "github.com/spf13/cobra" +) + +// TestComplete_ApplyMode_ReturnsFixedEnum pins the apply --mode +// completion: five upstream-supported values, no file fallback. +// Operators get tab-completed enums instead of meaningless file +// suggestions. +func TestComplete_ApplyMode_ReturnsFixedEnum(t *testing.T) { + got, directive := completeApplyMode(nil, nil, "") + + if !slices.Equal(got, applyModeOptions) { + t.Errorf("--mode completion = %v, want %v", got, applyModeOptions) + } + // Sanity: every value upstream's helpers.AddModeFlags + // registers must surface. The string forms are pinned in + // upstream's mode.go (modeAuto, modeNoReboot, …). + if len(got) != 5 { + t.Errorf("--mode completion must surface five upstream-supported modes; got %v", got) + } + // Pin presence of the always-default mode by string so a + // future rename of applyModeOptions still produces a + // detectable failure mode. + if !slices.Contains(got, "auto") { + t.Errorf("--mode completion missing default mode; got %v", got) + } + if directive != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("directive = %v, want ShellCompDirectiveNoFileComp", directive) + } +} + +// TestComplete_YAMLFileExt_FilterDirective pins the file-flag +// completion behaviour: cobra is told to narrow the file +// completion to `.yaml` / `.yml` extensions. +func TestComplete_YAMLFileExt_FilterDirective(t *testing.T) { + got, directive := completeYAMLFiles(nil, nil, "") + + want := []string{yamlExt, ymlExt} + if !slices.Equal(got, want) { + t.Errorf("file extension list = %v, want %v", got, want) + } + if directive != cobra.ShellCompDirectiveFilterFileExt { + t.Errorf("directive = %v, want ShellCompDirectiveFilterFileExt", directive) + } +} + +// TestComplete_PresetNames_ReturnsAvailablePresets pins that +// `talm init --preset ` enumerates the presets baked into +// the binary via pkg/generated. The exact list depends on the +// embedded charts; the test asserts at least one entry and that +// the directive disables file fallback. +func TestComplete_PresetNames_ReturnsAvailablePresets(t *testing.T) { + got, directive := completePresetNames(nil, nil, "") + + if directive != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("directive = %v, want ShellCompDirectiveNoFileComp", directive) + } + if len(got) == 0 { + t.Errorf("expected at least one preset; got empty slice") + } +} + +// TestComplete_PositionalNodeFiles_ReturnsModelinedYamlFromNodesDir +// pins the canonical positional-args completion for `apply` / +// `template` / `upgrade`: only YAML files under `nodes/` whose +// first line is a talm modeline are returned. Files without a +// modeline (e.g. operator-authored notes) are filtered out so +// completion does not point at files the command would reject +// at parse time. +func TestComplete_PositionalNodeFiles_ReturnsModelinedYamlFromNodesDir(t *testing.T) { + rootOrig := Config.RootDir + t.Cleanup(func() { Config.RootDir = rootOrig }) + + dir := t.TempDir() + Config.RootDir = dir + + nodesDir := filepath.Join(dir, "nodes") + if err := os.Mkdir(nodesDir, 0o755); err != nil { + t.Fatal(err) + } + + // Modelined node — should appear in completion. + modelined := "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(filepath.Join(nodesDir, "cp01.yaml"), []byte(modelined), 0o644); err != nil { + t.Fatal(err) + } + + // No modeline — must be filtered out. + plain := "# operator note\nmachine:\n type: controlplane\n" + if err := os.WriteFile(filepath.Join(nodesDir, "note.yaml"), []byte(plain), 0o644); err != nil { + t.Fatal(err) + } + + // Wrong extension — must be filtered out. + if err := os.WriteFile(filepath.Join(nodesDir, "scratch.txt"), []byte("anything"), 0o644); err != nil { + t.Fatal(err) + } + + got, directive := completeNodeFiles(nil, nil, "") + + want := []string{filepath.Join("nodes", "cp01.yaml")} + if !slices.Equal(got, want) { + t.Errorf("positional node-file completion = %v, want %v", got, want) + } + if directive != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("directive = %v, want ShellCompDirectiveNoFileComp", directive) + } +} + +// TestComplete_TalosconfigField_UnionAcrossContexts pins that +// `--nodes` / `--endpoints` completion enumerates the union of +// every context's field across the in-scope talosconfig. The +// operator may have multiple contexts (one per cluster); each +// context's nodes / endpoints must surface so they can be picked +// at tab time. +func TestComplete_TalosconfigField_UnionAcrossContexts(t *testing.T) { + talosconfigOrig := GlobalArgs.Talosconfig + rootOrig := Config.RootDir + t.Cleanup(func() { + GlobalArgs.Talosconfig = talosconfigOrig + Config.RootDir = rootOrig + }) + + dir := t.TempDir() + Config.RootDir = dir + tcPath := filepath.Join(dir, "talosconfig") + + tc := "context: alpha\n" + + "contexts:\n" + + " alpha:\n" + + " endpoints: [10.0.0.1, 10.0.0.2]\n" + + " nodes: [10.0.0.10]\n" + + " beta:\n" + + " endpoints: [10.0.1.1]\n" + + " nodes: [10.0.1.10, 10.0.1.11]\n" + if err := os.WriteFile(tcPath, []byte(tc), 0o600); err != nil { + t.Fatal(err) + } + + GlobalArgs.Talosconfig = tcPath + + gotNodes, dirNodes := CompleteTalosconfigNodes(nil, nil, "") + if dirNodes != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("--nodes directive = %v, want NoFileComp", dirNodes) + } + for _, want := range []string{"10.0.0.10", "10.0.1.10", "10.0.1.11"} { + if !slices.Contains(gotNodes, want) { + t.Errorf("--nodes completion missing %q; got %v", want, gotNodes) + } + } + + gotEndpoints, dirEndpoints := CompleteTalosconfigEndpoints(nil, nil, "") + if dirEndpoints != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("--endpoints directive = %v, want NoFileComp", dirEndpoints) + } + for _, want := range []string{"10.0.0.1", "10.0.0.2", "10.0.1.1"} { + if !slices.Contains(gotEndpoints, want) { + t.Errorf("--endpoints completion missing %q; got %v", want, gotEndpoints) + } + } +} + +// TestComplete_ApplyCmd_HasFlagCompletionForMode pins that the +// init() wiring is in place: `apply --mode` is registered with +// the completeApplyMode function. Regression guard against a +// future refactor that drops a RegisterFlagCompletionFunc line. +func TestComplete_ApplyCmd_HasFlagCompletionForMode(t *testing.T) { + fn, exists := applyCmd.GetFlagCompletionFunc("mode") + if !exists || fn == nil { + t.Fatal("apply --mode missing flag completion registration") + } +} + +// TestComplete_InitCmd_HasFlagCompletionForPreset pins the +// init-side wiring. +func TestComplete_InitCmd_HasFlagCompletionForPreset(t *testing.T) { + fn, exists := initCmd.GetFlagCompletionFunc("preset") + if !exists || fn == nil { + t.Fatal("init --preset missing flag completion registration") + } +} + +// TestComplete_AnchorCommands_FileFlagUsesModelinedCompleter pins +// that `talm apply --file`, `talm template --file`, and +// `talm upgrade --file` register the modelined-aware completer +// (completeNodeFiles) rather than the generic yaml-extension +// filter. The operator typing `-f ` lands on the curated +// list of modelined files under /nodes/, not on every +// .yaml in CWD. The helper produces a directive +// ShellCompDirectiveNoFileComp when matches are found (vs the +// generic completer's ShellCompDirectiveFilterFileExt) — the +// distinction is observable from the registered function's +// behavior on a real project fixture. +func TestComplete_AnchorCommands_FileFlagUsesModelinedCompleter(t *testing.T) { + rootOrig := Config.RootDir + t.Cleanup(func() { Config.RootDir = rootOrig }) + + dir := t.TempDir() + Config.RootDir = dir + + if err := os.Mkdir(filepath.Join(dir, "nodes"), 0o755); err != nil { + t.Fatal(err) + } + + body := "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(filepath.Join(dir, "nodes", "cp01.yaml"), []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + for _, cmd := range []*cobra.Command{applyCmd, templateCmd} { + fn, exists := cmd.GetFlagCompletionFunc("file") + if !exists || fn == nil { + t.Errorf("%q --file completion not registered", cmd.Name()) + + continue + } + + got, directive := fn(cmd, nil, "") + + if !slices.Contains(got, filepath.Join("nodes", "cp01.yaml")) { + t.Errorf("%q --file completion did not surface the modelined node file; got %v", cmd.Name(), got) + } + + if directive != cobra.ShellCompDirectiveNoFileComp { + t.Errorf("%q --file directive = %v, want ShellCompDirectiveNoFileComp (modelined completer surfaces a curated list, not the file-extension filter)", cmd.Name(), directive) + } + } +} + +// TestComplete_RootTalosconfig_NoExplicitCompletion pins that +// --talosconfig is NOT wired with a custom completion function. +// talosconfig has no fixed extension and there is no useful curated +// list to surface (operators pick the file by hand). Wiring an +// extension filter would narrow legitimate paths (no .yaml suffix +// requirement); wiring a curated list would require enumerating +// every kubeconfig-style file on the system. Cobra's default +// filename completion is the correct fallback. This test pins the +// absence so a future "add --talosconfig completion" change must +// be intentional, not accidental. +func TestComplete_RootTalosconfig_NoExplicitCompletion(t *testing.T) { + // Replicate the registration done in main.go's + // registerRootFlags so the test exercises the same surface + // without booting the rootCmd itself. + root := &cobra.Command{Use: "talm-test"} + root.PersistentFlags().String("talosconfig", "", "") + + fn, exists := root.GetFlagCompletionFunc("talosconfig") + if exists && fn != nil { + t.Errorf("--talosconfig must not have a custom completion func; cobra's default file completion is the right shape") + } +} + +// TestComplete_AnchorCommands_NoValidArgsFunction pins the +// inverse contract: applyCmd, templateCmd, and the upgrade wrapper +// all declare cobra.NoArgs (or upstream's equivalent), which +// suppresses ValidArgsFunction in cobra's __complete path. Wiring +// a positional completer onto these commands would pin dead +// surface — the completer never reaches the shell, but tests +// would pass anyway. This test makes the absence the contract. +func TestComplete_AnchorCommands_NoValidArgsFunction(t *testing.T) { + for _, cmd := range []*cobra.Command{applyCmd, templateCmd} { + if cmd.ValidArgsFunction != nil { + t.Errorf("%q must NOT register a ValidArgsFunction — cobra.NoArgs suppresses it, leaving dead wiring; complete --file via RegisterFlagCompletionFunc instead", cmd.Name()) + } + } +} + +// TestComplete_PositionalNodeFiles_IncludesFilesWithLeadingComments +// pins that completion stays in sync with the modeline parser +// relaxation: a node file produced by `talm template -I` +// against a previously edited file carries a leading operator- +// comment block above the modeline. Operators must still see it +// in `talm apply ` / `talm upgrade `. The dual-parser +// split that originally shipped left completion on the strict +// parser, silently dropping leading-comment files from the list. +func TestComplete_PositionalNodeFiles_IncludesFilesWithLeadingComments(t *testing.T) { + rootOrig := Config.RootDir + t.Cleanup(func() { Config.RootDir = rootOrig }) + + dir := t.TempDir() + Config.RootDir = dir + + nodesDir := filepath.Join(dir, "nodes") + if err := os.Mkdir(nodesDir, 0o755); err != nil { + t.Fatal(err) + } + + body := "# Operator note: reset 2026-05-12 after ticket OPS-1234\n" + + "# DO NOT edit values directly; modify values.yaml and re-template\n" + + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(filepath.Join(nodesDir, "cp01.yaml"), []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + got, _ := completeNodeFiles(nil, nil, "") + + want := filepath.Join("nodes", "cp01.yaml") + if !slices.Contains(got, want) { + t.Errorf("file with leading operator comments must appear in completion; got %v", got) + } +} diff --git a/pkg/commands/contract_apply_chain_stack_test.go b/pkg/commands/contract_apply_chain_stack_test.go new file mode 100644 index 00000000..cc25cef0 --- /dev/null +++ b/pkg/commands/contract_apply_chain_stack_test.go @@ -0,0 +1,337 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "context" + "os" + "path/filepath" + "strings" + "testing" + + cerrors "github.com/cockroachdb/errors" + "github.com/cozystack/talm/pkg/engine" + "github.com/siderolabs/talos/pkg/machinery/client" +) + +// TestApplyTemplatesPerNode_StacksSidePatches_SingleApplyPerNode +// pins the stack-semantics contract: when the operator passes +// `talm apply -f anchor.yaml -f side1.yaml -f side2.yaml`, talm +// must call ApplyConfiguration exactly ONCE per node with the +// composed bundle, not three times where each later call +// overwrites the earlier one. An earlier design iterated each -f +// file as an independent apply — Talos replaces the whole +// MachineConfig per call, so the second/third apply silently +// destroyed everything from the first. +// +// The test runs applyTemplatesPerNode with two side-patches and +// captures every (render, apply) invocation. Expectations: +// - render called once per node (anchor's chart render) +// - apply called once per node with the final merged bytes +// - the final bytes contain content from BOTH side-patches in +// order — proves the patches stacked instead of overwriting. +func TestApplyTemplatesPerNode_StacksSidePatches_SingleApplyPerNode(t *testing.T) { + dir := t.TempDir() + + // Anchor: modeline-only file so MergeFileAsPatch on it is a + // no-op and the test focuses on the side-patch chain. + anchor := filepath.Join(dir, "anchor.yaml") + if err := os.WriteFile(anchor, []byte( + "# talm: nodes=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + // Two side-patches with distinctive content so the final + // merged bytes prove both were applied. + side1 := filepath.Join(dir, "side1.yaml") + if err := os.WriteFile(side1, []byte( + "machine:\n certSANs:\n - canary.side1.example.com\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + side2 := filepath.Join(dir, "side2.yaml") + if err := os.WriteFile(side2, []byte( + "machine:\n network:\n hostname: canary-side2\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + nodes := []string{testNodeAddrA} + var renderCalls int + var applyBodies [][]byte + + render := func(_ context.Context, _ *client.Client, _ engine.Options) ([]byte, error) { + renderCalls++ + + // Minimal valid machine config so MergeFileAsPatch can + // parse and merge the side-patches. + return []byte("version: v1alpha1\nmachine:\n type: controlplane\n"), nil + } + apply := func(_ context.Context, _ *client.Client, data []byte) error { + applyBodies = append(applyBodies, append([]byte(nil), data...)) + + return nil + } + + err := applyTemplatesPerNode(engine.Options{}, anchor, []string{side1, side2}, nodes, fakeAuthOpenClient(context.Background()), render, apply) + if err != nil { + t.Fatalf("applyTemplatesPerNode: %v", err) + } + + if renderCalls != 1 { + t.Errorf("render must be called once per node (1); got %d", renderCalls) + } + + if len(applyBodies) != 1 { + t.Fatalf("apply must be called exactly once per node (an earlier loop-per-file design called it N times, second apply overwriting the first); got %d", len(applyBodies)) + } + + final := string(applyBodies[0]) + if !strings.Contains(final, "canary.side1.example.com") { + t.Errorf("final merged bytes must include side1's certSAN; got:\n%s", final) + } + if !strings.Contains(final, "canary-side2") { + t.Errorf("final merged bytes must include side2's hostname; got:\n%s", final) + } +} + +// TestApplyTemplatesPerNode_StacksSidePatches_LastWriterWins pins +// the ordering invariant: when two side-patches set the SAME field +// to different values, the LATER one wins. The sibling stack test +// only proves both patches land; this test proves the merge is +// ordered, which is the contract engine.MergeFileAsPatch upholds +// when called in a loop ("each sidePatch in order"). Without this +// pin, a future refactor that swaps the merge loop for an order- +// blind walk (e.g. map iteration) would pass the both-present +// assertion while silently producing non-deterministic results on +// collision. +func TestApplyTemplatesPerNode_StacksSidePatches_LastWriterWins(t *testing.T) { + dir := t.TempDir() + + anchor := filepath.Join(dir, "anchor.yaml") + if err := os.WriteFile(anchor, []byte( + "# talm: nodes=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + // Both side-patches set machine.network.hostname to different + // values. Per last-writer-wins, side2's value must end up in + // the merged config; side1's must NOT leak through. + side1 := filepath.Join(dir, "side1.yaml") + if err := os.WriteFile(side1, []byte( + "machine:\n network:\n hostname: from-side1\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + side2 := filepath.Join(dir, "side2.yaml") + if err := os.WriteFile(side2, []byte( + "machine:\n network:\n hostname: from-side2\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + var applyBody []byte + + render := func(_ context.Context, _ *client.Client, _ engine.Options) ([]byte, error) { + return []byte("version: v1alpha1\nmachine:\n type: controlplane\n"), nil + } + apply := func(_ context.Context, _ *client.Client, data []byte) error { + applyBody = append([]byte(nil), data...) + + return nil + } + + if err := applyTemplatesPerNode(engine.Options{}, anchor, []string{side1, side2}, []string{testNodeAddrA}, fakeAuthOpenClient(context.Background()), render, apply); err != nil { + t.Fatalf("applyTemplatesPerNode: %v", err) + } + + final := string(applyBody) + if !strings.Contains(final, "from-side2") { + t.Errorf("last side-patch must win on collision; final config must include from-side2; got:\n%s", final) + } + if strings.Contains(final, "from-side1") { + t.Errorf("earlier side-patch's value must be overwritten by the later patch's value; from-side1 leaked into final:\n%s", final) + } +} + +// TestApplyHelpText_DocumentsAnchorSidePatchContract pins that +// `talm apply --help` surfaces the multi-file contract: first -f +// is the anchor (modelined, rooted), subsequent -f are side- +// patches stacked on top. Operators reading --help should not be +// surprised by the behaviour change from "batch independent +// applies" to "stacked composition". +func TestApplyHelpText_DocumentsAnchorSidePatchContract(t *testing.T) { + if !strings.Contains(applyCmd.Long, "side-patch") { + t.Errorf("apply Long must document side-patch semantics; got:\n%s", applyCmd.Long) + } + if !strings.Contains(applyCmd.Long, "anchor") { + t.Errorf("apply Long must name the anchor concept; got:\n%s", applyCmd.Long) + } + + fileFlag := applyCmd.Flags().Lookup("file") + if fileFlag == nil { + t.Fatal("apply --file flag missing") + } + if !strings.Contains(fileFlag.Usage, "anchor") || !strings.Contains(fileFlag.Usage, "side-patch") { + t.Errorf("--file Usage must mention anchor + side-patch contract; got:\n%s", fileFlag.Usage) + } +} + +// TestApplyOneFile_MalformedModeline_SurfacesParseError pins the +// fix for the silent-routing bug: a file with a present-but- +// malformed modeline (e.g. typo `node` for `nodes`, broken JSON +// value, prose before the modeline) MUST surface the parse error +// to the operator rather than route into the direct-patch path +// where the real cause is hidden behind a misleading "no nodes" +// hint. fileHasTalmModeline now distinguishes +// modeline.ErrModelineNotFound (legitimate "no `# talm:` line") +// from any other parse failure. +func TestApplyOneFile_MalformedModeline_SurfacesParseError(t *testing.T) { + dir := t.TempDir() + + file := filepath.Join(dir, "node.yaml") + // `node=` (typo, missing `s`) — ParseModeline accepts the key + // but stops short of mapping it to Config.Nodes. The real + // cause we want to surface is malformed-JSON values; use a + // broken JSON array which ParseModeline DOES reject. + if err := os.WriteFile(file, []byte( + "# talm: nodes=[broken json, endpoints=[\"1.2.3.4\"]\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + err := applyOneFile(file, nil) + if err == nil { + t.Fatal("expected parse error to surface; got nil") + } + + msg := err.Error() + if strings.Contains(msg, "no modeline in") || strings.Contains(msg, "no nodes available") { + t.Errorf("malformed-modeline parse error must surface to the operator, not get rewritten as 'no nodes'; got:\n%s", msg) + } + if !strings.Contains(msg, "parsing modeline") { + t.Errorf("error must name the parse step so the operator can locate the typo; got:\n%s", msg) + } +} + +// TestApplyOneFile_NonModelinedAnchorWithSidePatches_Rejected pins +// the failure mode for a chain whose first -f file lacks a +// modeline: rejected with a hint pointing at the missing modeline. +// Without this gate, talm would try to feed the non-modelined +// anchor through direct-patch mode while ignoring the side-patches +// entirely. +func TestApplyOneFile_NonModelinedAnchorWithSidePatches_Rejected(t *testing.T) { + dir := t.TempDir() + + anchor := filepath.Join(dir, "anchor.yaml") + if err := os.WriteFile(anchor, []byte( + "machine:\n type: controlplane\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + side := filepath.Join(dir, "side.yaml") + if err := os.WriteFile(side, []byte("machine: {}\n"), 0o644); err != nil { + t.Fatal(err) + } + + err := applyOneFile(anchor, []string{side}) + if err == nil { + t.Fatal("expected error: chain anchor without modeline must be rejected") + } + if !strings.Contains(err.Error(), "modeline") { + t.Errorf("error must name the missing modeline; got: %v", err) + } +} + +// TestApplyOneFile_ModelinedSidePatches_RejectedWithLoopHint pins the +// guard against a silent breaking change. Before this gate, a chain +// of two modelined node files (`talm apply -f n1.yaml -f n2.yaml`, +// the kubectl-style multi-node-apply pattern) would silently treat +// n2 as a bytes-level patch stacked onto n1's render: n2's modeline +// became a YAML comment under MergeFileAsPatch, its body got merged +// into n1's targets, and n2's nodes/templates/endpoints were never +// reached. The gate rejects this shape loudly with a hint pointing +// at the per-file shell loop that produces independent applies. +func TestApplyOneFile_ModelinedSidePatches_RejectedWithLoopHint(t *testing.T) { + dir := t.TempDir() + + anchor := filepath.Join(dir, "n1.yaml") + if err := os.WriteFile(anchor, []byte( + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"t.yaml\"]\n"+ + "machine:\n type: controlplane\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + sideWithModeline := filepath.Join(dir, "n2.yaml") + if err := os.WriteFile(sideWithModeline, []byte( + "# talm: nodes=[\"5.6.7.8\"], endpoints=[\"5.6.7.8\"], templates=[\"t.yaml\"]\n"+ + "machine:\n type: controlplane\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + err := applyOneFile(anchor, []string{sideWithModeline}) + if err == nil { + t.Fatal("expected error: modelined side-patch must be rejected to prevent silent multi-node-apply breakage") + } + + msg := err.Error() + if !strings.Contains(msg, "side-patch") || !strings.Contains(msg, "modeline") { + t.Errorf("error must name both the side-patch role and the modeline cause; got: %v", err) + } + + hints := cerrors.GetAllHints(err) + joinedHints := strings.Join(hints, "\n") + if !strings.Contains(joinedHints, "for f in") && !strings.Contains(joinedHints, "loop") { + t.Errorf("hint must point at the per-file shell loop pattern as the correct multi-node-apply shape; got hints:\n%s", joinedHints) + } +} + +// TestApplyOneFile_MalformedModelineInSidePatch_SurfacesParseError +// pins that a side-patch's malformed modeline surfaces as a parse +// error, not silently merged into the anchor's render. Operators +// should fix the typo regardless of which file slot it lives in. +func TestApplyOneFile_MalformedModelineInSidePatch_SurfacesParseError(t *testing.T) { + dir := t.TempDir() + + anchor := filepath.Join(dir, "anchor.yaml") + if err := os.WriteFile(anchor, []byte( + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"t.yaml\"]\n"+ + "machine:\n type: controlplane\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + badSide := filepath.Join(dir, "bad-side.yaml") + if err := os.WriteFile(badSide, []byte( + "# talm: nodes=[broken json, endpoints=[\"1.2.3.4\"]\n", + ), 0o644); err != nil { + t.Fatal(err) + } + + err := applyOneFile(anchor, []string{badSide}) + if err == nil { + t.Fatal("expected error: malformed modeline in side-patch must surface") + } + if !strings.Contains(err.Error(), "parsing modeline in side-patch") { + t.Errorf("error must name the side-patch parse step; got: %v", err) + } +} diff --git a/pkg/commands/contract_init_ux_test.go b/pkg/commands/contract_init_ux_test.go index 2e60cb30..7536d50f 100644 --- a/pkg/commands/contract_init_ux_test.go +++ b/pkg/commands/contract_init_ux_test.go @@ -286,9 +286,10 @@ func TestContract_InitRun_PreCheckListsAllConflicts(t *testing.T) { } } -// Contract: when os.Getwd() fails, PreRunE returns the wrapped error -// rather than fail-open into the partial-overlay risk that #156 was -// about. +// Contract: when os.Getwd() fails, PreRunE returns the wrapped +// error rather than fail-open into the partial-overlay risk where +// init walks up the tree and writes preset files into the wrong +// project root. // // Reproducer: chdir into a temp dir, remove that dir, then call PreRunE. // On Linux the kernel returns ENOENT from getcwd(2); the guard wraps it. diff --git a/pkg/commands/contract_root_detection_test.go b/pkg/commands/contract_root_detection_test.go index 3607dc26..7c879e80 100644 --- a/pkg/commands/contract_root_detection_test.go +++ b/pkg/commands/contract_root_detection_test.go @@ -24,7 +24,10 @@ package commands import ( "os" "path/filepath" + "strings" "testing" + + "github.com/cockroachdb/errors" ) // makeProjectRoot creates a `Chart.yaml` and `secrets.yaml` (the two @@ -214,7 +217,14 @@ func TestContract_ValidateAndDetectRootsForFiles_SingleRoot(t *testing.T) { // commandline. talm refuses to apply a config built from // inconsistent inputs (it cannot meaningfully merge two project // configs in one apply). -func TestContract_ValidateAndDetectRootsForFiles_DifferentRootsError(t *testing.T) { +// TestContract_ValidateAndDetectRootsForFiles_DifferentRoots_FirstFileWins +// pins the contract: the first `-f` file anchors the project root; +// subsequent files in DIFFERENT roots are loaded as patches without +// re-detecting. An earlier loop-rejected any divergence between +// filePaths[i] and filePaths[0]; that gate was the same one +// blocking the side-patch use case, so the flip is the same change +// as for an outright-orphan second file. +func TestContract_ValidateAndDetectRootsForFiles_DifferentRoots_FirstFileWins(t *testing.T) { rootA := t.TempDir() rootB := t.TempDir() makeProjectRoot(t, rootA) @@ -227,41 +237,113 @@ func TestContract_ValidateAndDetectRootsForFiles_DifferentRootsError(t *testing. if err := os.WriteFile(fileB, []byte(""), 0o600); err != nil { t.Fatal(err) } - _, err := ValidateAndDetectRootsForFiles([]string{fileA, fileB}) - if err == nil { - t.Fatal("expected error for files in different roots") + + got, err := ValidateAndDetectRootsForFiles([]string{fileA, fileB}) + if err != nil { + t.Fatalf("first file anchors root; second file's divergent root must be ignored, got error: %v", err) + } + + wantRoot, _ := filepath.EvalSymlinks(rootA) + gotRoot, _ := filepath.EvalSymlinks(got) + if gotRoot != wantRoot { + t.Errorf("anchor root = %q, want %q (first file's project root wins)", gotRoot, wantRoot) } } -// Contract: a file whose directory has no Chart.yaml/secrets up the -// tree at all is reported by name in the error so the operator can -// see WHICH file failed root detection. -func TestContract_ValidateAndDetectRootsForFiles_OrphanFileError(t *testing.T) { +// TestContract_ValidateAndDetectRootsForFiles_FirstRootedSecondOrphan_Accepted +// pins the canonical side-patch case: a rooted node file +// followed by a `-f /tmp/patch.yaml` outside any project. The first +// file anchors the root, the orphan is loaded as a patch. +func TestContract_ValidateAndDetectRootsForFiles_FirstRootedSecondOrphan_Accepted(t *testing.T) { root := t.TempDir() makeProjectRoot(t, root) - // Orphan: file outside any project. + rootedFile := filepath.Join(root, "node.yaml") + if err := os.WriteFile(rootedFile, []byte(""), 0o600); err != nil { + t.Fatal(err) + } + orphanDir := t.TempDir() orphan := filepath.Join(orphanDir, "loose.yaml") if err := os.WriteFile(orphan, []byte(""), 0o600); err != nil { t.Fatal(err) } + + // Skip if the test environment's tmp ancestor happens to be a + // talm project — the orphan would resolve to a real root and + // the test premise would be invalid. + if got, _ := DetectProjectRootForFile(orphan); got != "" { + t.Skipf("test environment has project markers above %q; skipping orphan side-patch path", orphanDir) + } + + got, err := ValidateAndDetectRootsForFiles([]string{rootedFile, orphan}) + if err != nil { + t.Fatalf("first-rooted + second-orphan must succeed under the first-file-anchors contract; got error: %v", err) + } + + wantRoot, _ := filepath.EvalSymlinks(root) + gotRoot, _ := filepath.EvalSymlinks(got) + if gotRoot != wantRoot { + t.Errorf("anchor root = %q, want %q (first file's project root wins)", gotRoot, wantRoot) + } +} + +// TestContract_ValidateAndDetectRootsForFiles_FirstOrphan_RejectedWithReorderHint +// pins the inverse: orphan first, rooted second. Under the +// first-file-anchors contract this is rejected — there is no +// rooted anchor to begin with. The hint must tell the operator +// to reorder, not to move the file under a project. +func TestContract_ValidateAndDetectRootsForFiles_FirstOrphan_RejectedWithReorderHint(t *testing.T) { + root := t.TempDir() + makeProjectRoot(t, root) + rootedFile := filepath.Join(root, "node.yaml") if err := os.WriteFile(rootedFile, []byte(""), 0o600); err != nil { t.Fatal(err) } - // Note: this test passes only when $TMPDIR's ancestors do NOT - // happen to contain Chart.yaml + secrets.yaml. On a developer - // machine with such markers it would resolve to that. Skip if so. - got, _ := DetectProjectRootForFile(orphan) - if got != "" { - t.Skipf("test environment has project markers above %q; skipping orphan-error path", orphanDir) + orphanDir := t.TempDir() + orphan := filepath.Join(orphanDir, "loose.yaml") + if err := os.WriteFile(orphan, []byte(""), 0o600); err != nil { + t.Fatal(err) + } + + if got, _ := DetectProjectRootForFile(orphan); got != "" { + t.Skipf("test environment has project markers above %q; skipping orphan-first rejection path", orphanDir) + } + + _, err := ValidateAndDetectRootsForFiles([]string{orphan, rootedFile}) + if err == nil { + t.Fatal("expected error: first file with no project root must reject the whole chain") + } + + // The error message + hints must guide the operator to reorder. + hints := strings.Join(errors.GetAllHints(err), "\n") + full := err.Error() + "\n" + hints + if !strings.Contains(full, "first") { + t.Errorf("error / hint must explain that the FIRST file anchors the root; got:\n%s", full) + } +} + +// TestContract_ValidateAndDetectRootsForFiles_OnlyOrphan_StillRejected +// pins the no-contract-regression case: a single orphan file is +// rejected exactly as before. The first-file-anchors relaxation +// only widens the multi-file case; single-file behaviour stays +// strict. +func TestContract_ValidateAndDetectRootsForFiles_OnlyOrphan_StillRejected(t *testing.T) { + orphanDir := t.TempDir() + orphan := filepath.Join(orphanDir, "loose.yaml") + if err := os.WriteFile(orphan, []byte(""), 0o600); err != nil { + t.Fatal(err) + } + + if got, _ := DetectProjectRootForFile(orphan); got != "" { + t.Skipf("test environment has project markers above %q; skipping orphan-only rejection path", orphanDir) } - _, err := ValidateAndDetectRootsForFiles([]string{rootedFile, orphan}) + _, err := ValidateAndDetectRootsForFiles([]string{orphan}) if err == nil { - t.Fatal("expected error for orphan file") + t.Fatal("single-file orphan must continue to error (no contract regression)") } } diff --git a/pkg/commands/contract_root_dispatch_test.go b/pkg/commands/contract_root_dispatch_test.go index 8815e218..8fdffa1e 100644 --- a/pkg/commands/contract_root_dispatch_test.go +++ b/pkg/commands/contract_root_dispatch_test.go @@ -375,8 +375,12 @@ func TestContract_DetectAndSetRoot_FromFileFlag(t *testing.T) { } // Contract: when -f points at files under DIFFERENT project roots, -// the function errors. Pinning the strict-consistency rule. -func TestContract_DetectAndSetRoot_FilesInDifferentRootsError(t *testing.T) { +// the first file anchors the chain root. The second file's +// divergent root is no longer a failure — it is loaded as a patch +// on top of the first file's project. This test pins the relaxed +// contract; the previous strict-consistency rule was the same gate +// blocking the side-patch case. +func TestContract_DetectAndSetRoot_FilesInDifferentRoots_FirstFileWins(t *testing.T) { withConfigSnapshot(t) rootA := t.TempDir() @@ -403,9 +407,14 @@ func TestContract_DetectAndSetRoot_FilesInDifferentRootsError(t *testing.T) { } withOSArgs(t, []string{fixtureBinaryName}) - err := DetectAndSetRoot(cmd, nil) - if err == nil { - t.Fatal("expected error for files in different roots") + if err := DetectAndSetRoot(cmd, nil); err != nil { + t.Fatalf("first file must anchor the root; second file's divergent root must be ignored, got error: %v", err) + } + + wantRoot, _ := filepath.EvalSymlinks(rootA) + gotRoot, _ := filepath.EvalSymlinks(Config.RootDir) + if gotRoot != wantRoot { + t.Errorf("Config.RootDir = %q, want %q (first file's project root)", gotRoot, wantRoot) } } diff --git a/pkg/commands/contract_template_inplace_preserve_test.go b/pkg/commands/contract_template_inplace_preserve_test.go new file mode 100644 index 00000000..5c8a45e0 --- /dev/null +++ b/pkg/commands/contract_template_inplace_preserve_test.go @@ -0,0 +1,113 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "os" + "path/filepath" + "strings" + "testing" +) + +// TestPrependLeadingComments_PreservesAllLines pins the canonical +// in-place rewrite shape: operator-authored `#`-prefixed +// and blank lines that lived above the modeline in the source file +// are re-emitted at the top of the regenerated content. Without +// this guarantee, `talm template -I` silently strips operator +// documentation on every rewrite. +func TestPrependLeadingComments_PreservesAllLines(t *testing.T) { + leading := []string{ + "# Note 1: reset 2026-05-12 after ticket OPS-1234", + "# Note 2: DO NOT edit values directly", + "", + "# Note 3: extra context", + } + rendered := "# talm: nodes=[\"1.2.3.4\"]\n# THIS FILE IS AUTOGENERATED…\nmachine:\n type: controlplane\n" + + got := prependLeadingComments(leading, rendered) + + // Every leading line must appear, in order, before the + // rendered output. + want := "# Note 1: reset 2026-05-12 after ticket OPS-1234\n" + + "# Note 2: DO NOT edit values directly\n" + + "\n" + + "# Note 3: extra context\n" + + rendered + if got != want { + t.Errorf("prependLeadingComments mismatch\ngot:\n%s\n\nwant:\n%s", got, want) + } +} + +// TestPrependLeadingComments_EmptyLeading_ReturnsOutputUnchanged +// pins the degenerate case: no leading comments → output passes +// through verbatim. Pinning this avoids a future refactor where +// the helper accidentally appends an empty line for the +// "no comments" case. +func TestPrependLeadingComments_EmptyLeading_ReturnsOutputUnchanged(t *testing.T) { + rendered := "# talm: nodes=[\"1.2.3.4\"]\nmachine:\n type: controlplane\n" + + if got := prependLeadingComments(nil, rendered); got != rendered { + t.Errorf("nil leading must pass output through; got %q want %q", got, rendered) + } + + if got := prependLeadingComments([]string{}, rendered); got != rendered { + t.Errorf("empty leading must pass output through; got %q want %q", got, rendered) + } +} + +// TestWriteInplaceRendered_PreservesOperatorCommentsAboveModeline +// pins the end-to-end behaviour wired through templateOneFile: +// when `talm template -I` rewrites a node file, the operator's +// leading-comment block survives the regeneration. Reaches in via +// the same writeInplaceRendered helper templateOneFile uses, so a +// future refactor that changes the call site (different writer, +// different file path, etc.) but keeps the helper signature must +// still pass this contract. +func TestWriteInplaceRendered_PreservesOperatorCommentsAboveModeline(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + + leading := []string{ + "# This node was reset 2026-05-12 after ticket OPS-1234", + "# DO NOT edit values directly; modify values.yaml and re-template", + } + rendered := "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/cp.yaml\"]\n" + + "# THIS FILE IS AUTOGENERATED. PREFER TEMPLATE EDITS OVER MANUAL ONES.\n" + + "machine:\n type: controlplane\n" + + combined := prependLeadingComments(leading, rendered) + + if err := writeInplaceRendered(file, combined); err != nil { + t.Fatalf("writeInplaceRendered: %v", err) + } + + got, err := os.ReadFile(file) + if err != nil { + t.Fatalf("read back: %v", err) + } + + // The written file must start with the operator's leading + // lines, then the modeline, then the rendered body — in this + // exact order. + gotStr := string(got) + if !strings.HasPrefix(gotStr, leading[0]+"\n"+leading[1]+"\n") { + t.Errorf("operator comments missing or out of order at start of file; got:\n%s", gotStr) + } + + modelineIdx := strings.Index(gotStr, "# talm:") + if modelineIdx <= 0 { + t.Errorf("modeline must follow the leading comments, not start the file; got:\n%s", gotStr) + } +} diff --git a/pkg/commands/contract_template_set_help_test.go b/pkg/commands/contract_template_set_help_test.go new file mode 100644 index 00000000..631fadf9 --- /dev/null +++ b/pkg/commands/contract_template_set_help_test.go @@ -0,0 +1,76 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "strings" + "testing" +) + +// TestTemplateFlag_SetHelpText_MentionsSetString pins the operator- +// facing UX: `talm template --help` must surface the --set-string +// escape hatch directly in the --set flag description. Operators +// hitting the strvals dot-nesting footgun (e.g. +// `--set endpoint=10.0.0.1` parsed as nested map) check --help +// before CHANGELOG. The Usage string must name --set-string so the +// hint lands in front of them. +func TestTemplateFlag_SetHelpText_MentionsSetString(t *testing.T) { + flag := templateCmd.Flags().Lookup("set") + if flag == nil { + t.Fatal("expected templateCmd to register a --set flag, got nil") + } + + if !strings.Contains(flag.Usage, "--set-string") { + t.Errorf("--set Usage must point at --set-string as the escape hatch for IP / version literals; got:\n%s", flag.Usage) + } +} + +// TestTemplateFlag_SetStringHelpText_MentionsLiteralStability pins +// the sibling contract: --set-string Usage must describe its niche +// (IP / CIDR / version literals — values that must not be type- +// coerced or dot-nested). Without this, the help text reads as +// "set STRING values" with no clue when to prefer it over --set. +func TestTemplateFlag_SetStringHelpText_MentionsLiteralStability(t *testing.T) { + flag := templateCmd.Flags().Lookup("set-string") + if flag == nil { + t.Fatal("expected templateCmd to register a --set-string flag, got nil") + } + + // The Usage must surface the literal-stability niche. The + // word "literal" is the canonical anchor; the sibling words + // (IP / CIDR / version) all describe shapes that need it. + usage := strings.ToLower(flag.Usage) + if !strings.Contains(usage, "literal") { + t.Errorf("--set-string Usage must name the literal-stability use case; got:\n%s", flag.Usage) + } +} + +// TestTemplateFlag_SetStringHelpText_DoesNotOverpromiseHostnames pins +// the alignment between the --set-string Usage copy and the actual +// detector in pkg/engine/setvalue_warn.go. The screener matches +// IPv4 / IPv4 CIDR / semver shapes only — hostnames like +// `foo.example.com` are NOT flagged, so the Usage must NOT advertise +// hostname coverage. A previous revision did, which mislead operators +// into thinking `--set host=foo.example.com` would be screened. +func TestTemplateFlag_SetStringHelpText_DoesNotOverpromiseHostnames(t *testing.T) { + flag := templateCmd.Flags().Lookup("set-string") + if flag == nil { + t.Fatal("expected templateCmd to register a --set-string flag, got nil") + } + + if strings.Contains(strings.ToLower(flag.Usage), "hostname") { + t.Errorf("--set-string Usage must NOT claim hostname coverage — the screener does not detect hostname-shaped values; got:\n%s", flag.Usage) + } +} diff --git a/pkg/commands/contract_upgrade_image_source_test.go b/pkg/commands/contract_upgrade_image_source_test.go new file mode 100644 index 00000000..9a7f67f9 --- /dev/null +++ b/pkg/commands/contract_upgrade_image_source_test.go @@ -0,0 +1,141 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "os" + "path/filepath" + "strings" + "testing" + + "github.com/cockroachdb/errors" +) + +// TestResolveUpgradeImageFromValues_ReadsImage pins the canonical +// path: `talm upgrade` reads its target installer image +// from values.yaml (the source of truth for cluster-wide knobs), +// not from the rendered node body. The previous implementation +// called engine.FullConfigProcess on the node body and extracted +// machine.install.image from the result — silently stuck on the +// LAST templated image, even after the operator bumped +// values.yaml::image to upgrade the cluster. +func TestResolveUpgradeImageFromValues_ReadsImage(t *testing.T) { + dir := t.TempDir() + + values := "image: ghcr.io/siderolabs/installer:v1.13.0\n" + + "endpoint: https://192.0.2.10:6443\n" + if err := os.WriteFile(filepath.Join(dir, "values.yaml"), []byte(values), 0o644); err != nil { + t.Fatal(err) + } + + got, err := resolveUpgradeImageFromValues(dir) + if err != nil { + t.Fatalf("resolveUpgradeImageFromValues: %v", err) + } + + want := "ghcr.io/siderolabs/installer:v1.13.0" + if got != want { + t.Errorf("upgrade image must come from values.yaml; got %q want %q", got, want) + } +} + +// TestResolveUpgradeImageFromValues_MissingImage_ErrorWithHint +// pins the operator-facing failure mode: when values.yaml exists +// but has no `image` key, the error must name the file AND point +// at the recovery path (set the key OR pass --image explicitly). +// Without the hint, the operator sees a bare "image not set" and +// has to grep the codebase to find where to fix it. +func TestResolveUpgradeImageFromValues_MissingImage_ErrorWithHint(t *testing.T) { + dir := t.TempDir() + + values := "endpoint: https://192.0.2.10:6443\n" + if err := os.WriteFile(filepath.Join(dir, "values.yaml"), []byte(values), 0o644); err != nil { + t.Fatal(err) + } + + _, err := resolveUpgradeImageFromValues(dir) + if err == nil { + t.Fatal("expected error when values.yaml lacks `image`; got nil") + } + + if !strings.Contains(err.Error(), "image") { + t.Errorf("error must name the missing field; got: %v", err) + } + + if !strings.Contains(err.Error(), "values.yaml") { + t.Errorf("error must name values.yaml so the operator locates the file to edit; got: %v", err) + } + + hints := errors.GetAllHints(err) + if len(hints) == 0 { + t.Fatalf("expected at least one hint guiding the operator; got bare error: %v", err) + } + + combined := strings.Join(hints, "\n") + if !strings.Contains(combined, "--image") { + t.Errorf("hint chain must mention the --image escape hatch; got: %s", combined) + } +} + +// TestResolveUpgradeImageFromValues_MissingValuesYAML_Error pins +// the missing-file error path. The operator-facing error must +// surface the path so they can locate the project root mismatch. +func TestResolveUpgradeImageFromValues_MissingValuesYAML_Error(t *testing.T) { + dir := t.TempDir() // no values.yaml inside + + _, err := resolveUpgradeImageFromValues(dir) + if err == nil { + t.Fatal("expected error when values.yaml is missing; got nil") + } + + if !strings.Contains(err.Error(), "values.yaml") { + t.Errorf("error must name the missing values.yaml file; got: %v", err) + } +} + +// TestResolveUpgradeImageFromValues_EmptyImage_ErrorSameAsMissing +// pins the empty-string case: values.yaml with `image: ""` is +// indistinguishable from missing image — both must error out +// rather than silently passing an empty string downstream to +// talosctl, where the failure mode is opaque. Beyond non-nil, the +// error/hint shape must match the MissingImage path so operators +// get one consistent recovery story regardless of which precise +// shape (key absent vs. key empty) their values.yaml has. +func TestResolveUpgradeImageFromValues_EmptyImage_ErrorSameAsMissing(t *testing.T) { + dir := t.TempDir() + + values := "image: \"\"\nendpoint: https://192.0.2.10:6443\n" + if err := os.WriteFile(filepath.Join(dir, "values.yaml"), []byte(values), 0o644); err != nil { + t.Fatal(err) + } + + _, err := resolveUpgradeImageFromValues(dir) + if err == nil { + t.Fatal("expected error when values.yaml has empty image; got nil") + } + + if !strings.Contains(err.Error(), "image") || !strings.Contains(err.Error(), "values.yaml") { + t.Errorf("empty image must match missing-image error shape (names both `image` and `values.yaml`); got: %v", err) + } + + hints := errors.GetAllHints(err) + if len(hints) == 0 { + t.Fatalf("expected at least one hint for the empty-image case; got bare error: %v", err) + } + + if !strings.Contains(strings.Join(hints, "\n"), "--image") { + t.Errorf("empty-image hint chain must mention the --image escape hatch (same as missing-image); got: %v", hints) + } +} diff --git a/pkg/commands/init.go b/pkg/commands/init.go index 407ee40b..cc391e63 100644 --- a/pkg/commands/init.go +++ b/pkg/commands/init.go @@ -1176,7 +1176,11 @@ func updateTalmLibraryChart() error { presetName, err = readChartYamlPreset() if err != nil { - return errors.Wrap(err, "preset is required: use --preset flag or ensure Chart.yaml has a preset dependency") + // readChartYamlPreset already returns errors.WithHint("preset + // not found in Chart.yaml dependencies", "add a preset chart…"); + // wrapping that again would double-message the operator with + // the same fact. + return err } } @@ -1277,6 +1281,10 @@ func init() { initCmd.Flags().BoolVarP(&initCmdFlags.encrypt, "encrypt", "e", false, "encrypt all sensitive files (secrets.yaml, talosconfig, kubeconfig)") initCmd.Flags().BoolVarP(&initCmdFlags.decrypt, "decrypt", "d", false, "decrypt all encrypted files (does not require preset)") + // Shell completion for `talm init --preset`: preset names are + // baked in at build time via pkg/generated. + _ = initCmd.RegisterFlagCompletionFunc("preset", completePresetNames) + addCommand(initCmd) // Don't mark preset as required - it's validated in PreRunE based on --encrypt/--decrypt flags } diff --git a/pkg/commands/init_test.go b/pkg/commands/init_test.go index f6338cb7..72e63b3d 100644 --- a/pkg/commands/init_test.go +++ b/pkg/commands/init_test.go @@ -22,6 +22,7 @@ import ( "strings" "testing" + "github.com/cockroachdb/errors" "gopkg.in/yaml.v3" ) @@ -350,6 +351,74 @@ func TestUpdateTalmLibraryChartRejectsImageFlag(t *testing.T) { } } +// TestInitUpdate_PresetNotFoundError_SingleWrapped pins that the +// "preset not declared anywhere" error path for `talm init --update` +// surfaces once, not twice. readChartYamlPreset already returns a +// cockroachdb/errors.WithHint("preset not found in Chart.yaml +// dependencies", "add a preset chart…"); wrapping that again with +// errors.Wrap(err, "preset is required: use --preset flag or ensure +// Chart.yaml has a preset dependency") double-messages the operator: +// the rendered error becomes "preset is required: …: preset not +// found in Chart.yaml dependencies" — same fact twice. The fix is to +// return the inner error directly; its hint already covers both +// recovery paths (add a preset dependency OR pass --preset). +func TestInitUpdate_PresetNotFoundError_SingleWrapped(t *testing.T) { + imageOrig := initCmdFlags.image + presetOrig := initCmdFlags.preset + rootOrig := Config.RootDir + t.Cleanup(func() { + initCmdFlags.image = imageOrig + initCmdFlags.preset = presetOrig + Config.RootDir = rootOrig + }) + + initCmdFlags.image = "" + initCmdFlags.preset = "" + + dir := t.TempDir() + Config.RootDir = dir + + // Chart.yaml carrying only the `talm` library dependency, no + // preset chart. readChartYamlPreset will iterate the deps and + // return its "preset not found" error. + chartYaml := "apiVersion: v2\nname: test\nversion: 0.1.0\n" + + "dependencies:\n" + + " - name: talm\n" + + " version: 0.1.0\n" + if err := os.WriteFile(filepath.Join(dir, "Chart.yaml"), []byte(chartYaml), 0o644); err != nil { + t.Fatalf("seed Chart.yaml: %v", err) + } + + err := updateTalmLibraryChart() + if err == nil { + t.Fatal("expected --update without preset to error; got nil") + } + + msg := err.Error() + + // Double-wrap signature: the outer wrap text leaks into the + // rendered string. After the fix this substring is gone. + if strings.Contains(msg, "preset is required:") { + t.Errorf("error must be single-wrapped — found outer wrap text %q in:\n%s", "preset is required:", msg) + } + + // The inner cause must survive untouched. + if !strings.Contains(msg, "preset not found in Chart.yaml dependencies") { + t.Errorf("error must surface the inner cause; got:\n%s", msg) + } + + // The hint chain must still surface the recovery path. + hints := errors.GetAllHints(err) + if len(hints) == 0 { + t.Fatalf("expected at least one hint guiding the operator on how to declare a preset; got bare error:\n%s", msg) + } + + combined := strings.Join(hints, "\n") + if !strings.Contains(combined, "add a preset chart") { + t.Errorf("hint chain must mention adding a preset chart (or equivalent recovery); got:\n%s", combined) + } +} + // TestValidateImageRefShape_RejectsMalformed pins the shape check // that runs before any --image value reaches the preset rewrite. // Catches operator typos (::malformed, no-slash:tag, trailing diff --git a/pkg/commands/preflight_apply_safety_redact.go b/pkg/commands/preflight_apply_safety_redact.go index 2fa410fb..41805f34 100644 --- a/pkg/commands/preflight_apply_safety_redact.go +++ b/pkg/commands/preflight_apply_safety_redact.go @@ -30,10 +30,9 @@ import ( // pre-shared keys. // 3. The path has a stable form in v1alpha1. // -// Out of scope (matches the issue author's #189 scope): exhaustive -// sweep over every Sensitive-marked field in the Talos v1alpha1 -// schema. The list grows issue-by-issue when an operator reports a -// new leak; each addition should cite the symptom. +// Out of scope: exhaustive sweep over every Sensitive-marked field +// in the Talos v1alpha1 schema. The list grows when an operator +// reports a new leak; each addition should cite the symptom. // // Bracket-normalisation lets array-indexed paths // (cluster.acceptedCAs[2].key) match the wildcard form diff --git a/pkg/commands/preflight_test.go b/pkg/commands/preflight_test.go index 11277e8b..11ab7a8e 100644 --- a/pkg/commands/preflight_test.go +++ b/pkg/commands/preflight_test.go @@ -147,7 +147,7 @@ func TestPreflightCheckTalosVersion(t *testing.T) { wantHint: true, }, { - // Reproduction case from cozystack/talm#132: unset talosVersion + older node. + // Reproduction case: unset talosVersion + older node — the warning hint must still fire. name: "empty configured treated as current and warns", configured: "", readerVersion: "v1.11.6", diff --git a/pkg/commands/preflight_upgrade_verify.go b/pkg/commands/preflight_upgrade_verify.go index 9a3f33a1..dd4d355e 100644 --- a/pkg/commands/preflight_upgrade_verify.go +++ b/pkg/commands/preflight_upgrade_verify.go @@ -40,9 +40,9 @@ const postUpgradeVersionMismatchHint = "two hypotheses produce this symptom: " + // against the target version parsed from the installer image. Talos // auto-rolls back when a new boot fails its readiness check, but the // upgrade RPC has already acked — so success without verification is -// false advertising. See cozystack/talm#175 for the reproduction -// (cross-vendor installer image triggering an A/B rollback on a -// 3-node OCI v1.12.6 stand). +// false advertising. Reproducer: cross-vendor installer image +// triggers an A/B rollback because the new partition fails its +// boot readiness check; talosctl reports success regardless. // // Returns nil on match (or on best-effort read failure with --skip-* not // set; the caller decides). Returns a hint-bearing blocker on diff --git a/pkg/commands/reset_handler_test.go b/pkg/commands/reset_handler_test.go index 56dfa26a..1427db01 100644 --- a/pkg/commands/reset_handler_test.go +++ b/pkg/commands/reset_handler_test.go @@ -37,11 +37,12 @@ func registerResetFlagsForTest(cmd *cobra.Command, wipeModeStore *string, labels cmd.Flags().StringSliceVar(labelsStore, "system-labels-to-wipe", nil, "system disk partitions to wipe by label") } -// TestWrapResetCommand_NoFlags_AppliesSafeDefault pins the headline -// behaviour change for #185: when an operator runs `talm reset` -// without choosing a wipe scope, the wrapper's PreRunE pre-populates -// `--system-labels-to-wipe` with STATE,EPHEMERAL so the META -// partition is preserved and the node self-recovers on reboot. +// TestWrapResetCommand_NoFlags_AppliesSafeDefault pins the safe +// default for `talm reset`: when an operator runs the command +// without choosing a wipe scope, the wrapper's PreRunE +// pre-populates `--system-labels-to-wipe` with STATE,EPHEMERAL so +// the META partition is preserved and the node self-recovers on +// reboot. // // `--wipe-mode` is intentionally left unchanged (still upstream's // default). The server-side reset codepath, when SystemPartitionsToWipe diff --git a/pkg/commands/root.go b/pkg/commands/root.go index 49305cb3..802f90e1 100644 --- a/pkg/commands/root.go +++ b/pkg/commands/root.go @@ -201,44 +201,50 @@ func DetectProjectRootForFile(filePath string) (string, error) { return DetectProjectRoot(fileDir) } -// ValidateAndDetectRootsForFiles validates that all files belong to the same project root. -// Returns the common root directory and an error if files belong to different roots. +// ValidateAndDetectRootsForFiles resolves the project root for a +// chain of `-f` files. Only the FIRST file anchors the project +// root; subsequent files are loaded as patches without re-running +// detection, so a chain like +// `talm apply -f nodes/cp01.yaml -f /tmp/side-patch.yaml` +// is accepted — cp01.yaml carries the root, side-patch.yaml is +// patched on top without needing its own Chart.yaml ancestor. +// +// The first-file-anchors rule is ordering-dependent by design. +// Reversing the chain (orphan first, rooted second) is rejected +// with a hint that names the FIRST file and tells the operator to +// reorder, not to move the file. Single-file orphans continue to +// error out exactly as before. +// +// Wrapped talosctl subcommands (`talm dashboard -f …`, +// `talm reset -f …`, `talm get -f …`) also call this through their +// PreRunE chain. For them the "chain" notion isn't semantic — each +// file is its own per-node modeline source — but the relaxed +// first-file-anchors rule still applies: a cross-project chain that +// would have errored before now silently pins Config.RootDir to +// file[0]'s root. In practice operators don't mix files from +// different projects in a single talosctl invocation; if they do, +// EnsureTalosconfigPath downstream will use file[0]'s talosconfig. func ValidateAndDetectRootsForFiles(filePaths []string) (string, error) { if len(filePaths) == 0 { return "", nil } - var commonRoot string - - roots := make(map[string]bool) + anchor := filePaths[0] - for _, filePath := range filePaths { - fileRoot, err := DetectProjectRootForFile(filePath) - if err != nil { - return "", errors.Wrapf(err, "failed to detect root for file %s", filePath) - } - - if fileRoot == "" { - //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. - return "", errors.WithHint( - errors.Newf("failed to detect project root for file %s (Chart.yaml and secrets.yaml not found)", filePath), - "run `talm init` at the project root, or move the file under an existing talm project", - ) - } + fileRoot, err := DetectProjectRootForFile(anchor) + if err != nil { + return "", errors.Wrapf(err, "failed to detect root for file %s", anchor) + } - roots[fileRoot] = true - if commonRoot == "" { - commonRoot = fileRoot - } else if commonRoot != fileRoot { - //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. - return "", errors.WithHint( - errors.Newf("files belong to different project roots: %s and %s", commonRoot, fileRoot), - "run the command separately for each project, or pass files from a single project root", - ) - } + if fileRoot == "" { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return "", errors.WithHint( + errors.Newf("failed to detect project root for first file %s (Chart.yaml and secrets.yaml not found)", anchor), + "the first -f file anchors the project root; place it inside a `talm init`'d project, or reorder the -f chain so a rooted file comes first", + ) } - return commonRoot, nil + return fileRoot, nil } // DetectRootForTemplate detects the project root for a template file path. @@ -248,7 +254,14 @@ func DetectRootForTemplate(templatePath string) (string, error) { } func processModelineAndUpdateGlobals(configFile string, nodesFromArgs, endpointsFromArgs, overwrite bool) ([]string, error) { - modelineConfig, err := modeline.ReadAndParseModeline(configFile) + // FindAndParseModeline accepts operator-authored comment / blank + // lines before the modeline. Every workflow that consumes node + // files — apply, upgrade, template -I, completion, wrapped + // talosctl subcommands — must agree on file shape; a strict + // "first line must be modeline" rule would silently break the + // apply / upgrade / talosctl path against files that + // template -I just produced. + _, modelineConfig, err := modeline.FindAndParseModeline(configFile) if err != nil { // Don't print the error here — cobra surfaces the wrapped // return through stderr at the command level. Printing here diff --git a/pkg/commands/root_test.go b/pkg/commands/root_test.go index ccd47562..6c1f74c4 100644 --- a/pkg/commands/root_test.go +++ b/pkg/commands/root_test.go @@ -40,6 +40,48 @@ machine: } } +// TestProcessModelineAndUpdateGlobals_AcceptsLeadingComments pins +// that the modeline parser used by apply / upgrade / completion / +// wrapped talosctl commands shares the same file-shape contract as +// `talm template -I`. Without this, a node file produced by +// the in-place rewrite — which preserves operator comments above +// the modeline — would fail on the very next `talm apply -f` / +// `talm upgrade -f` call against the same file. +func TestProcessModelineAndUpdateGlobals_AcceptsLeadingComments(t *testing.T) { + origNodes := GlobalArgs.Nodes + origEndpoints := GlobalArgs.Endpoints + t.Cleanup(func() { + GlobalArgs.Nodes = origNodes + GlobalArgs.Endpoints = origEndpoints + }) + + dir := t.TempDir() + configFile := filepath.Join(dir, "node.yaml") + content := "# Operator note: reset 2026-05-12 after ticket OPS-1234\n" + + "# DO NOT edit values directly; modify values.yaml and re-template\n" + + "# talm: nodes=[\"10.0.0.1\"], endpoints=[\"10.0.0.1\"], templates=[\"templates/cp.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(configFile, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + GlobalArgs.Nodes = nil + GlobalArgs.Endpoints = nil + + templates, err := processModelineAndUpdateGlobals(configFile, false, false, true) + if err != nil { + t.Fatalf("modeline parse must accept leading operator comments; got error: %v", err) + } + + if len(templates) != 1 || templates[0] != "templates/cp.yaml" { + t.Errorf("templates = %v, want [templates/cp.yaml]", templates) + } + + if len(GlobalArgs.Nodes) != 1 || GlobalArgs.Nodes[0] != "10.0.0.1" { + t.Errorf("GlobalArgs.Nodes = %v, want [10.0.0.1]", GlobalArgs.Nodes) + } +} + func TestProcessModelineAndUpdateGlobals_NoTemplates(t *testing.T) { origNodes := GlobalArgs.Nodes origEndpoints := GlobalArgs.Endpoints diff --git a/pkg/commands/talosctl_wrapper.go b/pkg/commands/talosctl_wrapper.go index f7f23fd4..b8184abf 100644 --- a/pkg/commands/talosctl_wrapper.go +++ b/pkg/commands/talosctl_wrapper.go @@ -72,14 +72,14 @@ const rotateCACmdName = "rotate-ca" // //nolint:gochecknoglobals // package-level map keyed by name; cheap O(1) lookup in the propagation hot path. var rootShadowedPersistentFlags = map[string]struct{}{ - talosconfigName: {}, - "root": {}, - "context": {}, - "nodes": {}, - "endpoints": {}, - "cluster": {}, - "skip-verify": {}, - "version": {}, + talosconfigName: {}, + "root": {}, + "context": {}, + flagNameNodes: {}, + flagNameEndpoints: {}, + "cluster": {}, + "skip-verify": {}, + "version": {}, // siderov1-keys-dir is registered by upstream addCommand into // taloscommands.GlobalArgs.SideroV1KeysDir, but talm does not // model that auth path and never populates the field. If we diff --git a/pkg/commands/talosctl_wrapper_test.go b/pkg/commands/talosctl_wrapper_test.go index e8d9e166..0dc3f0ce 100644 --- a/pkg/commands/talosctl_wrapper_test.go +++ b/pkg/commands/talosctl_wrapper_test.go @@ -100,8 +100,8 @@ func TestPropagatePersistentFlags_RenamesShorthandF(t *testing.T) { } // TestWrapTalosCommand_InheritsParentPersistentFlags pins the -// structural contract behind #194: when an upstream parent registers -// a persistent flag, every wrapped child must surface that flag in +// structural contract: when an upstream parent registers a +// persistent flag, every wrapped child must surface that flag in // its effective flag set. cobra's mergePersistentFlags walks the // wrapped parent's PersistentFlags() at parse time; if the wrapper // only copies LOCAL flags, persistent ones from the upstream parent @@ -187,7 +187,7 @@ func TestWrapTalosCommand_RealImageListPropagatesNamespace(t *testing.T) { } // TestWrapCrashdumpCommand_PrepopulatesGlobalArgsNodes pins the -// contract for #180: when crashdump's per-class node flags +// contract: when crashdump's per-class node flags // (--init-node, --control-plane-nodes, --worker-nodes) are set // and GlobalArgs.Nodes is otherwise empty, the wrapper's PreRunE // populates GlobalArgs.Nodes from their union so the upstream @@ -266,8 +266,8 @@ func TestWrapCrashdumpCommand_DoesNotShadowExistingNodes(t *testing.T) { } // TestWrapKubeconfigCommand_PositionalPathErrorMessageMatchesContract -// pins the rewritten error message for #193. The previous wording -// claimed `use --login flag to pass arguments`, which conflated two +// pins the rewritten error message. The previous wording claimed +// `use --login flag to pass arguments`, which conflated two // distinct things: --login switches the kubeconfig target between // local and system, it does not pass a positional path. The new // message describes what the wrapper actually does (default writes @@ -333,8 +333,8 @@ func TestWrapKubeconfigCommand_PositionalPathErrorMessageMatchesContract(t *test } // TestWrapDmesgCommand_TailEqualsNumeric_RewritesError pins the -// FlagErrorFunc cushion for #195. Upstream talosctl registers -// --tail as a BoolVarP (toggling tail-mode for --follow), but +// FlagErrorFunc cushion for `--tail=` typo. Upstream talosctl +// registers --tail as a BoolVarP (toggling tail-mode for --follow), but // operators' first instinct is tail(1)-style line count. `--tail=3` // trips pflag's ParseBool and surfaces a cryptic // 'strconv.ParseBool: parsing "3": invalid syntax' error. @@ -401,8 +401,8 @@ func TestWrapDmesgCommand_TailEqualsNumeric_RewritesError(t *testing.T) { } // TestWrapTUICommand_NonTTY_RefusesWithHint pins the cushion for -// #183 across BOTH wrapped interactive-only commands (dashboard, -// edit). Each has a different upstream failure mechanism: +// non-TTY invocations across BOTH wrapped interactive-only commands +// (dashboard, edit). Each has a different upstream failure mechanism: // dashboard panics in tcell teardown, edit hangs in the kubectl // external-editor helper. The refusal here is the same shape for // both — clear cobra-surfaced error with operator-facing hint — diff --git a/pkg/commands/template.go b/pkg/commands/template.go index 5287acfe..32edb5d5 100644 --- a/pkg/commands/template.go +++ b/pkg/commands/template.go @@ -19,6 +19,7 @@ import ( "fmt" "os" "path/filepath" + "strings" "github.com/cockroachdb/errors" "github.com/cozystack/talm/pkg/engine" @@ -57,8 +58,16 @@ var templateCmdFlags struct { var templateCmd = &cobra.Command{ Use: "template", Short: "Render templates locally and display the output", - Long: ``, - Args: cobra.NoArgs, + Long: `Render Talos configuration templates locally. + +Multi-file invocation note: unlike ` + "`talm apply`" + ` (where the first -f +anchors and later -f files are stacked side-patches), +` + "`talm template -f a.yaml -f b.yaml -f c.yaml`" + ` renders each file +independently driven by its own modeline. This is intentional — +template's per-file render mirrors the "regenerate the artifact +beside the source" workflow, while apply's chain models "compose a +single MachineConfig and apply it once".`, + Args: cobra.NoArgs, PreRunE: func(cmd *cobra.Command, _ []string) error { templateCmdFlags.valueFiles = append(Config.TemplateOptions.ValueFiles, templateCmdFlags.valueFiles...) templateCmdFlags.values = append(Config.TemplateOptions.Values, templateCmdFlags.values...) @@ -179,7 +188,11 @@ func templateWithFiles(args []string) func(ctx context.Context, c *client.Client // mode. Splitting the per-file work out of templateWithFiles keeps // the outer function's cognitive complexity within the linter's gate. func templateOneFile(ctx context.Context, args []string, configFile string, firstFileProcessed *bool) error { - modelineConfig, err := modeline.ReadAndParseModeline(configFile) + // FindAndParseModeline accepts operator-authored comment / blank + // lines before the modeline so `talm template -I` can preserve + // them on rewrite. Anything other than `#`-prefixed comments or + // blanks before the modeline is still rejected. + leadingComments, modelineConfig, err := modeline.FindAndParseModeline(configFile) if err != nil { return errors.Wrap(err, "modeline parsing failed") } @@ -227,15 +240,20 @@ func templateOneFile(ctx context.Context, args []string, configFile string, firs ) } - tmpl := buildTemplateRunner(args, configFile, firstFileProcessed) + tmpl := buildTemplateRunner(args, configFile, leadingComments, firstFileProcessed) return runTemplate(ctx, tmpl) } // buildTemplateRunner returns the per-file render-and-emit closure. // Extracted so both templateOneFile and the dispatcher in runTemplate -// can stay flat. -func buildTemplateRunner(args []string, configFile string, firstFileProcessed *bool) func(ctx context.Context, c *client.Client) error { +// can stay flat. leadingComments is the slice of operator-authored +// `#`-prefixed / blank lines that lived above the modeline in the +// source file; in-place mode prepends them to the rewritten file so +// the operator's documentation survives the regeneration. +// Non in-place renders ignore leadingComments because the original +// file is left untouched. +func buildTemplateRunner(args []string, configFile string, leadingComments []string, firstFileProcessed *bool) func(ctx context.Context, c *client.Client) error { return func(ctx context.Context, c *client.Client) error { output, err := generateOutput(ctx, c, args) if err != nil { @@ -243,7 +261,7 @@ func buildTemplateRunner(args []string, configFile string, firstFileProcessed *b } if templateCmdFlags.inplace { - return writeInplaceRendered(configFile, output) + return writeInplaceRendered(configFile, prependLeadingComments(leadingComments, output)) } if *firstFileProcessed { @@ -448,12 +466,12 @@ func canonicalizeInsideRootPath(templatePath, absTemplatePath, absRootDir, relPa func init() { templateCmd.Flags().BoolVarP(&templateCmdFlags.insecure, "insecure", "i", false, "template using the insecure (encrypted with no auth) maintenance service") - templateCmd.Flags().StringSliceVarP(&templateCmdFlags.configFiles, "file", "f", nil, "specify config files for in-place update (can specify multiple)") + templateCmd.Flags().StringSliceVarP(&templateCmdFlags.configFiles, "file", "f", nil, "node config files for in-place update (`.yaml` / `.yml`; shell completion narrows to these extensions). Each file's modeline drives the per-file render.") templateCmd.Flags().BoolVarP(&templateCmdFlags.inplace, "in-place", "I", false, "re-template and update generated files in place (overwrite them)") templateCmd.Flags().StringSliceVarP(&templateCmdFlags.valueFiles, "values", "", []string{}, "specify values in a YAML file (can specify multiple)") templateCmd.Flags().StringSliceVarP(&templateCmdFlags.templateFiles, "template", "t", []string{}, "specify templates to render manifest from (can specify multiple)") - templateCmd.Flags().StringArrayVar(&templateCmdFlags.values, "set", []string{}, "set values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2)") - templateCmd.Flags().StringArrayVar(&templateCmdFlags.stringValues, "set-string", []string{}, "set STRING values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2)") + templateCmd.Flags().StringArrayVar(&templateCmdFlags.values, "set", []string{}, "set values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2). For IP / CIDR / version literals use --set-string — dots in --set values are interpreted as YAML key nesting.") + templateCmd.Flags().StringArrayVar(&templateCmdFlags.stringValues, "set-string", []string{}, "set STRING values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2). Use for IP addresses, CIDR blocks, version strings, or any literal value where dots must NOT be interpreted as YAML key nesting.") templateCmd.Flags().StringArrayVar(&templateCmdFlags.fileValues, "set-file", []string{}, "set values from respective files specified via the command line (can specify multiple or separate values with commas: key1=path1,key2=path2)") templateCmd.Flags().StringArrayVar(&templateCmdFlags.jsonValues, "set-json", []string{}, "set JSON values on the command line (can specify multiple or separate values with commas: key1=jsonval1,key2=jsonval2)") templateCmd.Flags().StringArrayVar(&templateCmdFlags.literalValues, "set-literal", []string{}, "set a literal STRING value on the command line") @@ -464,9 +482,42 @@ func init() { templateCmd.Flags().BoolVarP(&templateCmdFlags.offline, "offline", "", false, "disable gathering information and lookup functions") templateCmd.Flags().StringVar(&templateCmdFlags.kubernetesVersion, "kubernetes-version", constants.DefaultKubernetesVersion, "desired kubernetes version to run") + // Shell completion for `talm template` flags. `--file` uses the + // modelined-yaml lister (same as apply); other yaml-shaped flags + // fall back to the generic extension filter. ValidArgsFunction + // is NOT wired because templateCmd declares cobra.NoArgs, which + // suppresses positional completion in cobra's __complete path. + _ = templateCmd.RegisterFlagCompletionFunc("file", completeNodeFiles) + _ = templateCmd.RegisterFlagCompletionFunc("values", completeYAMLFiles) + _ = templateCmd.RegisterFlagCompletionFunc("template", completeYAMLFiles) + _ = templateCmd.RegisterFlagCompletionFunc("with-secrets", completeYAMLFiles) + addCommand(templateCmd) } +// prependLeadingComments emits the operator-authored `#`-prefixed +// / blank lines that lived above the modeline in the source file +// (returned by modeline.FindAndParseModeline) back at the top of +// the rewritten output. Without this, `talm template -I` would +// silently strip the leading comment block on every rewrite. +// Returns output unchanged when leading is empty (the conventional +// case — modeline on line 1). +func prependLeadingComments(leading []string, output string) string { + if len(leading) == 0 { + return output + } + + var b strings.Builder + for _, line := range leading { + b.WriteString(line) + b.WriteByte('\n') + } + + b.WriteString(output) + + return b.String() +} + // writeInplaceRendered writes the rendered template output over the // node config file. Routes through secureperm because the rendered // machine config embeds certs, PKI keys, and cluster join tokens — diff --git a/pkg/commands/upgrade_handler.go b/pkg/commands/upgrade_handler.go index 6dfcb846..b722207c 100644 --- a/pkg/commands/upgrade_handler.go +++ b/pkg/commands/upgrade_handler.go @@ -22,9 +22,7 @@ import ( "time" "github.com/cockroachdb/errors" - "github.com/cozystack/talm/pkg/engine" "github.com/siderolabs/talos/pkg/machinery/client" - "github.com/siderolabs/talos/pkg/machinery/config/configloader" "github.com/spf13/cobra" ) @@ -74,14 +72,40 @@ func validatePostUpgradeReconcileWindow(window time.Duration) error { // wrapUpgradeCommand adds special handling for upgrade command: extract image from config and set --image flag // -//nolint:gocognit,gocyclo,cyclop,funlen,nestif // cobra wrapper branching over (image extraction, file paths, modeline) for the upgrade flow; each branch is short. +//nolint:gocognit,gocyclo,cyclop,funlen // cobra wrapper branching over (image extraction, file paths, modeline) for the upgrade flow; each branch is short. func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Command, []string) error) { + // Extend the upstream Long with talm-specific behaviour so + // `talm upgrade --help` describes the actual image-resolution + // chain (values.yaml, --image override) instead of just the + // generic upstream upgrade flow. + wrappedCmd.Long = `Upgrade Talos on the target node(s). + +Image resolution (when -f is provided): + - --image takes precedence and is used as-is. + - otherwise, talm reads ` + "`image:`" + ` from values.yaml at the + project root and passes it as the upgrade target. Bumping + values.yaml::image is the canonical "raise the cluster's + Talos version" workflow — re-running ` + "`talm template`" + ` to + refresh node files first is NOT required. + +The first -f file anchors the project root (Chart.yaml + +secrets.yaml); its modeline supplies the nodes / endpoints. The +node body's machine.install.image is no longer consulted by the +upgrade flow.` + wrappedCmd.Flags().BoolVar(&upgradeCmdFlags.skipPostUpgradeVerify, "skip-post-upgrade-verify", false, - "skip the post-upgrade check that compares running Talos version against the target image's tag (Phase 2C; detects silent A/B rollback per #175)") + "skip the post-upgrade check that compares running Talos version against the target image's tag (detects silent A/B rollback after the RPC acks success)") wrappedCmd.Flags().DurationVar(&upgradeCmdFlags.postUpgradeReconcileWindow, "post-upgrade-reconcile-window", defaultPostUpgradeReconcileWindow, "how long to wait after upgrade returns before re-reading the running version; widen for slow hardware / large image pulls") + // Shell completion for `talm upgrade --file`: returns modelined + // yaml files under /nodes/. ValidArgsFunction is NOT + // wired because upstream's upgrade command declares no + // positional args; cobra's __complete path suppresses + // ValidArgsFunction when the arg-constraint is NoArgs. + _ = wrappedCmd.RegisterFlagCompletionFunc("file", completeNodeFiles) + wrappedCmd.RunE = func(cmd *cobra.Command, args []string) error { // Fail-fast on a bad --post-upgrade-reconcile-window BEFORE // any talosctl upgrade RPC fires. A zero / negative value @@ -116,12 +140,21 @@ func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Comm return err } - // If config files are provided and --image flag is not set, extract image from config + // If config files are provided and --image flag is not set, + // resolve the upgrade target image from values.yaml. + // values.yaml is the source of truth for cluster-wide knobs; + // the upgrade target reads from there directly rather than + // re-extracting from a rendered node body (which would pin + // the image to whatever was templated last, ignoring later + // values.yaml bumps). Per-node image override remains + // possible by passing --image explicitly. if len(filesToProcess) > 0 && !cmd.Flags().Changed("image") { - // Process first config file to extract image + // Process modeline so GlobalArgs.Nodes / .Endpoints are + // populated for the downstream talosctl invocation; we + // no longer use the modeline templates list, but the + // nodes/endpoints carry on. configFile := filesToProcess[0] - // Process modeline to update GlobalArgs nodesFromArgs := len(GlobalArgs.Nodes) > 0 endpointsFromArgs := len(GlobalArgs.Endpoints) > 0 @@ -129,69 +162,18 @@ func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Comm return errors.Wrap(err, "failed to process modeline") } - // Get talos-version, with-secrets, kubernetes-version from flags or config - talosVersion := Config.TemplateOptions.TalosVersion - - if cmd.Flags().Changed("talos-version") { - if val, err := cmd.Flags().GetString("talos-version"); err == nil { - talosVersion = val - } - } - - withSecrets := Config.TemplateOptions.WithSecrets - - if cmd.Flags().Changed("with-secrets") { - if val, err := cmd.Flags().GetString("with-secrets"); err == nil { - withSecrets = val - } - } - - // Resolve secrets.yaml path relative to project root if not absolute - withSecrets = ResolveSecretsPath(withSecrets) - - kubernetesVersion := Config.TemplateOptions.KubernetesVersion - - if cmd.Flags().Changed("kubernetes-version") { - if val, err := cmd.Flags().GetString("kubernetes-version"); err == nil { - kubernetesVersion = val - } - } - - // Process config to extract image - eopts := engine.Options{ - TalosVersion: talosVersion, - WithSecrets: withSecrets, - KubernetesVersion: kubernetesVersion, - } - - patches := []string{"@" + configFile} - - configBundle, machineType, err := engine.FullConfigProcess(eopts, patches) - if err != nil { - return errors.Wrap(err, "full config processing error") - } - - result, err := engine.SerializeConfiguration(configBundle, machineType) + image, err := resolveUpgradeImageFromValues(Config.RootDir) if err != nil { - return errors.Wrap(err, "error serializing configuration") - } - - config, err := configloader.NewFromBytes(result) - if err != nil { - return errors.Wrap(err, "error loading config") - } - - image := config.Machine().Install().Image() - if image == "" { - return errors.New("error getting image from config") + return err } - // Set --image flag with extracted image if err := cmd.Flags().Set("image", image); err != nil { - // Flag might not exist, ignore error + // Flag might not exist (extremely unlikely given + // upgradeCmd registers it); fall through with a + // warning rather than aborting. fmt.Fprintf(os.Stderr, "Warning: failed to set --image flag: %v\n", err) } else { - fmt.Fprintf(os.Stderr, "Using image from config: %s\n", image) + fmt.Fprintf(os.Stderr, "Using image from values.yaml: %s\n", image) } } @@ -223,8 +205,8 @@ func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Comm } // Phase 2C: post-upgrade version verify. Detects the silent - // auto-rollback case (#175): talosctl upgrade acks the RPC, - // Talos pulls + writes the new install, A/B boot fails its + // auto-rollback case: talosctl upgrade acks the RPC, Talos + // pulls + writes the new install, A/B boot fails its // readiness check, Talos rolls back to the prior partition, // and the operator's "successful" upgrade silently no-ops. // Skip predicate documents the cases where this gate cannot diff --git a/pkg/commands/upgrade_handler_test.go b/pkg/commands/upgrade_handler_test.go index 3a680918..454f9883 100644 --- a/pkg/commands/upgrade_handler_test.go +++ b/pkg/commands/upgrade_handler_test.go @@ -267,15 +267,15 @@ func TestRunPostUpgradeVersionVerifyInner_NonEmptyNodes_WaitsAndVerifies(t *test } // TestDefaultPostUpgradeReconcileWindow_Is90s pins back-compat for -// #190: the upgrade flow defaulted to a hard-coded 90s wait before -// the flag was introduced; the new --post-upgrade-reconcile-window -// must register the same value as its default so operators who -// never pass the flag observe byte-identical timing. +// the --post-upgrade-reconcile-window default: the upgrade flow +// defaulted to a hard-coded 90s wait before the flag was +// introduced; the flag must register the same value as its default +// so operators who never pass it observe byte-identical timing. func TestDefaultPostUpgradeReconcileWindow_Is90s(t *testing.T) { t.Parallel() if defaultPostUpgradeReconcileWindow != 90*time.Second { - t.Errorf("default reconcile window changed: got %s, want 90s — back-compat regression (#190)", defaultPostUpgradeReconcileWindow) + t.Errorf("default reconcile window changed: got %s, want 90s — back-compat regression", defaultPostUpgradeReconcileWindow) } } @@ -370,6 +370,24 @@ func TestValidatePostUpgradeReconcileWindow_PositiveAccepted(t *testing.T) { } } +// TestWrapUpgradeCommand_LongDocumentsImageResolution pins the +// operator-facing description of where the upgrade target image +// comes from. `talm upgrade --help` must explain the +// values.yaml → --image override chain so operators don't have to +// read the source. Without this pin a doc-removal refactor would +// silently revert the upgrade flow to "you have to guess where +// --image came from". +func TestWrapUpgradeCommand_LongDocumentsImageResolution(t *testing.T) { + cmd := &cobra.Command{Use: upgradeCmdName} + wrapUpgradeCommand(cmd, func(_ *cobra.Command, _ []string) error { return nil }) + + for _, want := range []string{"values.yaml", "--image"} { + if !strings.Contains(cmd.Long, want) { + t.Errorf("upgrade Long must name %q in the image-resolution explanation; got:\n%s", want, cmd.Long) + } + } +} + // TestWrapUpgradeCommand_BadReconcileWindow_FailsFastBeforeOriginalRunE // pins fail-fast semantics on the reconcile-window flag: a zero or // negative value must reject BEFORE the talosctl upgrade RPC fires. @@ -427,14 +445,12 @@ func TestWrapUpgradeCommand_BadReconcileWindow_FailsFastBeforeOriginalRunE(t *te // TestReadmePostUpgradeVerify_NoHardcoded90s mirrors // TestPostUpgradeVersionMismatchHint_NoHardcoded90s for the -// operator-facing README. The pre-#190 README claimed "waits 90s -// for the node to finish booting"; after #190 the window is -// operator-tunable via --post-upgrade-reconcile-window. The README -// bullet must reference "the configured reconcile window" rather -// than a literal 90s so an operator running with a custom window -// does not read contradictory documentation. Pin the absence of -// the literal so a future README edit re-introducing it fails this -// test. +// operator-facing README. An earlier copy claimed "waits 90s for +// the node to finish booting"; the window is now operator-tunable +// via --post-upgrade-reconcile-window, so the README must reference +// "the configured reconcile window" rather than a literal 90s. +// Pin the absence of the literal so a future README edit +// re-introducing it fails this test. func TestReadmePostUpgradeVerify_NoHardcoded90s(t *testing.T) { t.Parallel() diff --git a/pkg/commands/upgrade_image_source.go b/pkg/commands/upgrade_image_source.go new file mode 100644 index 00000000..3b53c913 --- /dev/null +++ b/pkg/commands/upgrade_image_source.go @@ -0,0 +1,63 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package commands + +import ( + "os" + "path/filepath" + + "github.com/cockroachdb/errors" + "gopkg.in/yaml.v3" +) + +// resolveUpgradeImageFromValues reads the cluster-default installer +// image from values.yaml. values.yaml is the source of truth for +// cluster-wide knobs; the upgrade target reads from there directly +// instead of re-extracting from the rendered node body (which would +// return whatever image was baked into the last `talm template` +// run, silently ignoring later values.yaml edits). +// +// Per-node image override is still available via --image; the +// resolver only fires when --image is unset. +func resolveUpgradeImageFromValues(rootDir string) (string, error) { + valuesPath := filepath.Join(rootDir, "values.yaml") + + data, err := os.ReadFile(valuesPath) + if err != nil { + //nolint:wrapcheck // cockroachdb/errors.WithHint attaches operator-facing guidance. + return "", errors.WithHintf( + errors.Wrapf(err, "reading values.yaml from project root %s", rootDir), + "talm upgrade resolves the target installer image from values.yaml; the project root is detected from the first -f file's directory walking up to the nearest Chart.yaml + secrets.yaml, or from --root explicitly. Ensure the -f file lives inside a `talm init`'d project, pass --root , or pass --image to skip resolution entirely.", + ) + } + + var values struct { + Image string `yaml:"image"` + } + + if err := yaml.Unmarshal(data, &values); err != nil { + return "", errors.Wrapf(err, "parsing values.yaml at %s", valuesPath) + } + + if values.Image == "" { + //nolint:wrapcheck // cockroachdb/errors.WithHint attaches operator-facing guidance. + return "", errors.WithHint( + errors.New("image not set in values.yaml"), + "set `image: ` in values.yaml (the cluster-wide default) or pass --image to override per-invocation", + ) + } + + return values.Image, nil +} diff --git a/pkg/engine/contract_network_multidoc_test.go b/pkg/engine/contract_network_multidoc_test.go index 787d0e4b..def51d9b 100644 --- a/pkg/engine/contract_network_multidoc_test.go +++ b/pkg/engine/contract_network_multidoc_test.go @@ -265,8 +265,8 @@ func TestContract_NetworkMultidoc_BondWithoutSlaves_SkipsBondConfig_Generic(t *t // TestContract_NetworkMultidoc_BondWithoutSlaves_WithVIP_SkipsBondConfig // pins the empty-slaves guardrail against the VIP-active code path. -// Field report (#9327 / #9351): an operator on v0.27.0 saw an empty -// BondConfig emitted "only when VIP is in use" — the existing +// Field report: an operator saw an empty BondConfig emitted "only +// when VIP is in use" — the existing // TestContract_NetworkMultidoc_BondWithoutSlaves_SkipsBondConfig pins // the no-VIP shape, so a hidden VIP-conditional branch could still // emit BondConfig{links: } without breaking the existing diff --git a/pkg/engine/contract_setvalue_warning_test.go b/pkg/engine/contract_setvalue_warning_test.go new file mode 100644 index 00000000..cd935d23 --- /dev/null +++ b/pkg/engine/contract_setvalue_warning_test.go @@ -0,0 +1,243 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package engine + +import ( + "bytes" + "strings" + "testing" +) + +// withCapturedSetValueWarnings redirects setValueWarningWriter to a +// caller-supplied buffer for the duration of the test, restoring the +// original writer via t.Cleanup. The package-level writer defaults +// to os.Stderr, so tests must isolate it to read the warning stream. +func withCapturedSetValueWarnings(t *testing.T) *bytes.Buffer { + t.Helper() + + buf := &bytes.Buffer{} + prev := setValueWarningWriter + setValueWarningWriter = buf + + t.Cleanup(func() { + setValueWarningWriter = prev + }) + + return buf +} + +// TestLoadValues_IPShapedSetValue_EmitsWarning pins the operator- +// footgun guard added for `--set =`. Helm's strvals.ParseInto +// interprets dots in the RHS as YAML key nesting, so +// `--set endpoint=192.168.1.1` produces the map +// `{endpoint: {192: {168: {1: 1}}}}` — silently corrupt config the +// operator notices only when the rendered manifest is wrong. The +// warning steers them to `--set-string` BEFORE the bad render lands. +func TestLoadValues_IPShapedSetValue_EmitsWarning(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{"endpoint=192.168.1.1"}}); err != nil { + t.Fatalf("loadValues should succeed even when the value is IP-shaped (warning, not fatal); got: %v", err) + } + + out := buf.String() + if !strings.Contains(out, "endpoint=192.168.1.1") { + t.Errorf("warning must echo the offending key=value so the operator can correlate; got:\n%s", out) + } + if !strings.Contains(out, "--set-string") { + t.Errorf("warning must point at --set-string as the fix; got:\n%s", out) + } +} + +// TestLoadValues_CIDRShapedSetValue_EmitsWarning extends the IPv4 +// case to a CIDR. The same nesting trap applies: dots in the RHS +// stay dots; the `/24` suffix is irrelevant to strvals' parser. +func TestLoadValues_CIDRShapedSetValue_EmitsWarning(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{"subnet=10.0.0.0/24"}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if !strings.Contains(buf.String(), "--set-string") { + t.Errorf("CIDR value must trigger the same --set-string warning; got:\n%s", buf.String()) + } +} + +// TestLoadValues_ColonSeparatedLiteral_NoWarning pins the no- +// false-positive contract for colon-separated values. The warning +// targets the strvals dot-nesting trap; colons are not strvals +// separators, so `--set startTime=12:34:56`, +// `--set mac=00:11:22:33:44:55`, and IPv6 literals must not emit +// a warning whose copy says "dots are interpreted as YAML key +// nesting". +func TestLoadValues_ColonSeparatedLiteral_NoWarning(t *testing.T) { + cases := []string{ + "startTime=12:34:56", + "mac=00:11:22:33:44:55", + "endpoint=2001:db8::1", + "hex=ABCD:1234:EF56", + } + + for _, v := range cases { + t.Run(v, func(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{v}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if buf.Len() > 0 { + t.Errorf("colon-separated literal %q must not emit IP-shape warning (colons are not strvals separators); got:\n%s", v, buf.String()) + } + }) + } +} + +// TestLoadValues_SemverShapedSetValue_EmitsWarning covers the +// version-string case (`v1.13.0` or `1.13.0`). Same root cause: +// dots become nesting separators. +func TestLoadValues_SemverShapedSetValue_EmitsWarning(t *testing.T) { + cases := []string{ + "image.tag=v1.13.0", // v-prefixed three-component + "image.tag=v1.13", // v-prefixed two-component + "image.tag=1.13.0", // bare three-component + } + + for _, v := range cases { + t.Run(v, func(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{v}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if !strings.Contains(buf.String(), "--set-string") { + t.Errorf("semver-shaped value must trigger the --set-string warning; got:\n%s", buf.String()) + } + }) + } +} + +// TestLoadValues_BareDecimalSetValue_NoWarning pins the negative +// contract for two-component bare decimals: `cpu=1.5`, +// `weight=2.0`, and similar plain-numeric values that share the +// "single dot, digits on both sides" shape with semver-two but +// carry no operator-intent signal toward strvals nesting. Without +// this guard the warning generates noise on every Helm chart that +// uses `--set cpu=1.5` to override resource requests. +func TestLoadValues_BareDecimalSetValue_NoWarning(t *testing.T) { + cases := []string{ + "cpu=1.5", + "weight=2.0", + "ratio=0.95", + } + + for _, v := range cases { + t.Run(v, func(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{v}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if buf.Len() > 0 { + t.Errorf("bare decimal %q must not emit semver warning; got:\n%s", v, buf.String()) + } + }) + } +} + +// TestLoadValues_InvalidIPv4OctetSetValue_NoWarning pins that the +// IPv4 detector rejects out-of-range octets. `999.999.999.999` +// shares the four-dotted-numeric shape with a real IPv4 literal +// but is overwhelmingly likely to be a chart-internal magic +// number, not an accidental IP literal. Tightening the regex +// avoids the false positive at zero cost to legitimate IPv4 +// detection. +func TestLoadValues_InvalidIPv4OctetSetValue_NoWarning(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{"magic=999.999.999.999"}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if buf.Len() > 0 { + t.Errorf("invalid-octet four-dotted-numeric must not trigger IPv4 warning; got:\n%s", buf.String()) + } +} + +// TestLoadValues_IPShapedSetStringValue_NoWarning pins the negative +// contract: when the operator already uses --set-string, no warning +// fires. The guard is for the `--set` footgun specifically. +func TestLoadValues_IPShapedSetStringValue_NoWarning(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{StringValues: []string{"endpoint=192.168.1.1"}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if buf.Len() > 0 { + t.Errorf("--set-string must not emit the IP-shape warning; got:\n%s", buf.String()) + } +} + +// TestLoadValues_PlainSetValue_NoWarning pins the no-false-positive +// contract: plain string / numeric / bool values must not trigger +// the warning. The detector matches IP / CIDR / semver shapes only. +func TestLoadValues_PlainSetValue_NoWarning(t *testing.T) { + cases := []string{ + "clusterName=test", + "count=5", + "enabled=true", + "image.repository=ghcr.io/foo/bar", + } + + for _, v := range cases { + t.Run(v, func(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{v}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + if buf.Len() > 0 { + t.Errorf("plain value %q must not emit IP-shape warning; got:\n%s", v, buf.String()) + } + }) + } +} + +// TestLoadValues_ChainedSetValue_WarnsOnEachIPLiteral covers Helm's +// comma-chained shorthand: `--set k1=v1,k2=v2`. Each comma-segment +// is parsed independently and each must be screened independently. +func TestLoadValues_ChainedSetValue_WarnsOnEachIPLiteral(t *testing.T) { + buf := withCapturedSetValueWarnings(t) + + if _, err := loadValues(Options{Values: []string{"a=1.2.3.4,b=plain,c=5.6.7.8"}}); err != nil { + t.Fatalf("loadValues: %v", err) + } + + out := buf.String() + if !strings.Contains(out, "a=1.2.3.4") { + t.Errorf("first IP literal must be flagged; got:\n%s", out) + } + if !strings.Contains(out, "c=5.6.7.8") { + t.Errorf("third IP literal must be flagged; got:\n%s", out) + } + if strings.Contains(out, "b=plain") { + t.Errorf("plain pair must not be flagged; got:\n%s", out) + } +} diff --git a/pkg/engine/engine.go b/pkg/engine/engine.go index 26a67186..00aa92d4 100644 --- a/pkg/engine/engine.go +++ b/pkg/engine/engine.go @@ -1692,6 +1692,13 @@ func loadValues(opts Options) (map[string]any, error) { base = mergeMaps(base, currentMap) } + // Screen --set values for IP / CIDR / version literals BEFORE + // strvals.ParseInto chews them. The parser interprets dots in + // the RHS as YAML key nesting, so `--set endpoint=10.0.0.1` + // produces `{endpoint: {10: {0: {0: 1}}}}` — silently corrupt + // config. The warning steers operators to --set-string. + screenSetValuesForCoercion(opts.Values) + // Parse and merge values from --set for _, value := range opts.Values { err := strvals.ParseInto(value, base) diff --git a/pkg/engine/engine_test.go b/pkg/engine/engine_test.go index d8cee0b6..c6631f63 100644 --- a/pkg/engine/engine_test.go +++ b/pkg/engine/engine_test.go @@ -282,7 +282,7 @@ func TestExtractExtraDocuments(t *testing.T) { wantErr bool }{ { - name: "issue #66 scenario - talos config with UserVolumeConfig", + name: "talos config with UserVolumeConfig as second document", patches: []string{"machine:\n type: worker\ncluster:\n name: test\n---\napiVersion: v1alpha1\nkind: UserVolumeConfig\nname: databig"}, wantTalos: 1, wantExtra: 1, diff --git a/pkg/engine/helm/funcs_test.go b/pkg/engine/helm/funcs_test.go index 17e511e5..02747e23 100644 --- a/pkg/engine/helm/funcs_test.go +++ b/pkg/engine/helm/funcs_test.go @@ -96,7 +96,9 @@ func TestFuncs(t *testing.T) { expect: `[error unmarshaling JSON: while decoding JSON: json: cannot unmarshal object into Go value of type []interface {}]`, vars: `hello: world`, }, { - // This should never result in a network lookup. Regression for #7955 + // This should never result in a network lookup; lookup on an + // unconnected engine must return empty rather than attempt + // kubeapi discovery. tpl: `{{ lookup "v1" "Namespace" "" "unlikelynamespace99999999" }}`, expect: `map[]`, vars: `["one", "two"]`, diff --git a/pkg/engine/render_test.go b/pkg/engine/render_test.go index 2a736282..4a2d615a 100644 --- a/pkg/engine/render_test.go +++ b/pkg/engine/render_test.go @@ -1193,7 +1193,7 @@ func TestMultiDocGeneric_VlanOnBondTopology(t *testing.T) { // each carrying a default route. eth0 is in `table=main`, eth1 is in a non-main // table (e.g. an alternate routing table) and a third interface has a main-table // route that should be ignored once the first match is taken. Used to verify -// `default_link_name_by_gateway` (#108) returns a single deterministic value +// `default_link_name_by_gateway` returns a single deterministic value // rather than concatenating link names from every matching route. func multiNicMultipleDefaultRoutesLookup() func(string, string, string) (map[string]any, error) { eth0 := map[string]any{ @@ -1311,10 +1311,11 @@ func multiNicMultipleDefaultRoutesLookup() func(string, string, string) (map[str } } -// TestDefaultLinkByGatewayHelpers_MultiNIC is a regression test for #108. -// When a node has multiple default routes (typical for DHCP on multi-NIC -// machines), the helpers historically iterated all matches and concatenated -// the outputs (e.g. `eth0eth1eth2`) and didn't filter by `table=main`. +// TestDefaultLinkByGatewayHelpers_MultiNIC pins the deterministic +// single-link result on multi-default-route nodes. When a node has +// multiple default routes (typical for DHCP on multi-NIC machines), +// an earlier helper iterated all matches and concatenated the +// outputs (e.g. `eth0eth1eth2`) and didn't filter by `table=main`. // After the fix the helpers must: // - filter routes by table=main // - return exactly one value (the first matching route) @@ -1346,10 +1347,10 @@ bus={{ include "talm.discovered.default_link_bus_by_gateway" . }} assertNotContains(t, output, "eth2") } -// secondaryNicLookup emulates a node with two physical NICs (eth0 primary -// with a default route, eth1 storage with a static subnet route and no -// default) plus a bond master link. Used to exercise the multi-NIC discovery -// helpers added for #125. +// secondaryNicLookup emulates a node with two physical NICs (eth0 +// primary with a default route, eth1 storage with a static subnet +// route and no default) plus a bond master link. Exercises the +// per-link multi-NIC discovery helpers. func secondaryNicLookup() func(string, string, string) (map[string]any, error) { eth0 := map[string]any{ "metadata": map[string]any{"id": "eth0"}, @@ -1488,10 +1489,10 @@ func secondaryNicLookup() func(string, string, string) (map[string]any, error) { } } -// TestSecondaryNicHelpers covers the per-link helpers added for #125. They -// expose every physical NIC (not just the primary one carrying the default -// route) so user templates can configure secondary uplinks (e.g. storage -// network on a control-plane). +// TestSecondaryNicHelpers covers the per-link discovery helpers +// that expose every physical NIC (not just the primary one carrying +// the default route) so user templates can configure secondary +// uplinks (e.g. storage network on a control-plane). func TestSecondaryNicHelpers(t *testing.T) { const tmpl = `physical={{ include "talm.discovered.physical_link_names" . }} configurable={{ include "talm.discovered.configurable_link_names" . }} @@ -1732,11 +1733,12 @@ func TestCozystackChartRendersIPv4GatewayOnDualStack(t *testing.T) { }) } -// TestNetworkMultidoc_NoDiscovery is a regression test for #58. When discovery -// returns no default route (offline render, isolated node, custom networking), -// the multidoc cozystack template must NOT emit a LinkConfig/BondConfig/ -// VLANConfig/Layer2VIPConfig with empty `name:` — Talos v1.12 rejects such -// documents with `[networking.os.device.interface] required`. +// TestNetworkMultidoc_NoDiscovery pins the empty-name guard: when +// discovery returns no default route (offline render, isolated +// node, custom networking), the multidoc cozystack template must +// NOT emit a LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig with +// empty `name:` — Talos v1.12 rejects such documents with +// `[networking.os.device.interface] required`. func TestNetworkMultidoc_NoDiscovery(t *testing.T) { output := renderChartTemplate(t, "../../charts/cozystack", "templates/controlplane.yaml", "v1.12") @@ -2163,10 +2165,11 @@ func TestMultiDocLinkConfigCarriesMTU(t *testing.T) { } } -// TestNetworkLegacy_NoDiscovery is a regression test for #58 covering the -// legacy (Talos < v1.12) path. The `interfaces:` key was unconditionally -// emitted, producing either an empty list or a `- interface:` block with an -// empty interface name when discovery found no default route. Either form +// TestNetworkLegacy_NoDiscovery pins the empty-interface guard on +// the legacy (Talos < v1.12) path. The `interfaces:` key was +// unconditionally emitted, producing either an empty list or a +// `- interface:` block with an empty interface name when discovery +// found no default route. Either form // breaks Talos validation. After the fix, the template must skip both // `interfaces:` and the per-interface block entirely. func TestNetworkLegacy_NoDiscovery(t *testing.T) { diff --git a/pkg/engine/setvalue_warn.go b/pkg/engine/setvalue_warn.go new file mode 100644 index 00000000..ed40ad2f --- /dev/null +++ b/pkg/engine/setvalue_warn.go @@ -0,0 +1,117 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package engine + +import ( + "fmt" + "io" + "os" + "regexp" + "strings" +) + +// setValueWarningWriter is the sink for the operator-facing warning +// emitted by loadValues when a `--set` value's RHS looks like an IP, +// CIDR, or version literal — shapes that strvals.ParseInto would +// interpret as nested map keys, silently corrupting the rendered +// config. Defaulted to os.Stderr; redirected in tests via +// withCapturedSetValueWarnings. +// +//nolint:gochecknoglobals // package-level writer is the standard Go pattern for test-overridable side-channel output; an Options-borne writer would propagate through the public engine API for a single internal-use warning. +var setValueWarningWriter io.Writer = os.Stderr + +// ipShapedValueRe matches a valid IPv4 literal or IPv4 CIDR. dots +// in these shapes are not separators (they belong to the address) +// but strvals.ParseInto reads them as YAML key nesting. The per- +// octet alternation rejects out-of-range numbers (256.x.y.z) so +// the heuristic does not flag arbitrary four-dotted numeric strings +// like 999.999.999.999 — which carry no operator-intent signal. +var ipShapedValueRe = regexp.MustCompile(`^(?:25[0-5]|2[0-4]\d|1?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|1?\d?\d)){3}(?:/(?:3[0-2]|[12]?\d))?$`) + +// semverShapedValueRe matches semantic-version literals: either a +// `v`-prefixed two-or-three-component form (`v1.13`, `v1.13.0`) or +// a bare three-component form (`1.13.0`). A bare two-component form +// (`1.5`, `2.0`) is deliberately NOT matched — those shapes are +// indistinguishable from plain decimal values like +// `cpu=1.5` / `weight=2.0` and would generate noise. Same dot-as- +// nesting trap as the IP shape; bare two-component decimals are +// the common false-positive class. +var semverShapedValueRe = regexp.MustCompile(`^(?:v\d+\.\d+(?:\.\d+)?|\d+\.\d+\.\d+)$`) + +// looksLikeAccidentalSetCoercion screens a single `key=value` pair +// for the canonical strvals footgun shapes. Returns true when the +// RHS contains dots that strvals.ParseInto would interpret as YAML +// key nesting against the operator's intent. IPv6 / MAC / colon- +// separated literals are deliberately NOT flagged: colons are not +// strvals separators, so `--set startTime=12:34:56` and +// `--set mac=00:11:22:33:44:55` do not hit the nesting trap and +// must not emit a misleading "use --set-string" warning. +func looksLikeAccidentalSetCoercion(pair string) bool { + // Split on the first '=' only; chained values are screened + // upstream by splitChainedSetValue. A pair without '=' + // can't have a footgun RHS to flag. + _, value, ok := strings.Cut(pair, "=") + if !ok || value == "" { + return false + } + + switch { + case ipShapedValueRe.MatchString(value): + return true + case semverShapedValueRe.MatchString(value): + return true + } + + return false +} + +// splitChainedSetValue splits a chained --set value +// (`k1=v1,k2=v2,k3=v3`) into its constituent pairs. The actual Helm +// strvals parser handles escape semantics; this helper exists only +// to screen each pair for the IP-shape footgun before strvals chews +// the string. A simple Split on ',' is enough for the canonical +// shapes the warning targets — operators with embedded commas in +// values should already be reaching for --set-literal or --set-json. +func splitChainedSetValue(value string) []string { + return strings.Split(value, ",") +} + +// emitSetValueCoercionWarning writes the operator-facing warning to +// setValueWarningWriter for the offending pair. Format matches the +// rest of the cobra error / hint conventions ("talm:" prefix, one +// line per warning). +func emitSetValueCoercionWarning(pair string) { + fmt.Fprintf(setValueWarningWriter, + "talm: --set %s looks like an IP / CIDR / version literal; "+ + "strvals.ParseInto interprets dots as YAML key nesting, "+ + "so the rendered value will be a nested map. Pass --set-string %s "+ + "to keep the literal verbatim.\n", + pair, pair, + ) +} + +// screenSetValuesForCoercion walks every comma-separated pair in +// every --set entry and emits a warning per footgun-shape match. +// The screen is non-fatal: parsing proceeds as before, the warning +// is the only behavioural change. +func screenSetValuesForCoercion(values []string) { + for _, value := range values { + for _, pair := range splitChainedSetValue(value) { + if looksLikeAccidentalSetCoercion(pair) { + emitSetValueCoercionWarning(pair) + } + } + } +} diff --git a/pkg/modeline/contract_find_and_parse_test.go b/pkg/modeline/contract_find_and_parse_test.go new file mode 100644 index 00000000..f52d0945 --- /dev/null +++ b/pkg/modeline/contract_find_and_parse_test.go @@ -0,0 +1,234 @@ +// Copyright Cozystack Authors +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package modeline + +import ( + "os" + "path/filepath" + "reflect" + "strings" + "testing" + + cerrors "github.com/cockroachdb/errors" +) + +// TestFindAndParseModeline_ModelineOnFirstLine_NoLeadingComments +// pins the degenerate case: the conventional shape (modeline on +// line 1) parses correctly and returns an empty leading-comments +// slice. The new helper must subsume ReadAndParseModeline's +// behaviour for this canonical path. +func TestFindAndParseModeline_ModelineOnFirstLine_NoLeadingComments(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/controlplane.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + leading, config, err := FindAndParseModeline(file) + if err != nil { + t.Fatalf("FindAndParseModeline: %v", err) + } + + if len(leading) != 0 { + t.Errorf("expected zero leading comments when modeline is on line 1; got %d: %v", len(leading), leading) + } + + want := &Config{ + Nodes: []string{"1.2.3.4"}, + Endpoints: []string{"1.2.3.4"}, + Templates: []string{"templates/controlplane.yaml"}, + } + if !reflect.DeepEqual(config, want) { + t.Errorf("Config = %+v, want %+v", config, want) + } +} + +// TestFindAndParseModeline_PreservesLeadingCommentLines pins the +// new contract for `talm template -I`: operator-authored comment +// lines above the modeline are returned verbatim so the rewrite +// path can prepend them back to the regenerated file. +func TestFindAndParseModeline_PreservesLeadingCommentLines(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "# Note 1: reset 2026-05-12 after ticket OPS-1234\n" + + "# Note 2: DO NOT edit values directly\n" + + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"templates/controlplane.yaml\"]\n" + + "machine:\n type: controlplane\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + leading, config, err := FindAndParseModeline(file) + if err != nil { + t.Fatalf("FindAndParseModeline: %v", err) + } + + wantLeading := []string{ + "# Note 1: reset 2026-05-12 after ticket OPS-1234", + "# Note 2: DO NOT edit values directly", + } + if !reflect.DeepEqual(leading, wantLeading) { + t.Errorf("leading comments mismatch\ngot: %q\nwant: %q", leading, wantLeading) + } + + if config == nil || len(config.Nodes) != 1 || config.Nodes[0] != "1.2.3.4" { + t.Errorf("config not parsed correctly: %+v", config) + } +} + +// TestFindAndParseModeline_AllowsBlankLinesBeforeModeline pins +// the relaxed prefix shape: blank lines and comments are allowed +// in any order before the modeline. +func TestFindAndParseModeline_AllowsBlankLinesBeforeModeline(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "# Note A\n" + + "\n" + + "# Note B\n" + + "\n" + + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"t.yaml\"]\n" + + "body: ...\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + leading, _, err := FindAndParseModeline(file) + if err != nil { + t.Fatalf("FindAndParseModeline: %v", err) + } + + wantLeading := []string{ + "# Note A", + "", + "# Note B", + "", + } + if !reflect.DeepEqual(leading, wantLeading) { + t.Errorf("leading lines (comments + blanks) mismatch\ngot: %q\nwant: %q", leading, wantLeading) + } +} + +// TestFindAndParseModeline_MisplacedModelineBelowYAML pins the +// strictness budget: a `# talm:` line sitting *below* arbitrary +// YAML is rejected with a specific "modeline below non-comment +// content" hint, NOT classified as an orphan file. Without this +// gate the operator would never learn that their modeline is in +// the wrong place. +func TestFindAndParseModeline_MisplacedModelineBelowYAML(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "machine:\n type: controlplane\n" + + "# talm: nodes=[\"1.2.3.4\"], endpoints=[\"1.2.3.4\"], templates=[\"t.yaml\"]\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + _, _, err := FindAndParseModeline(file) + if err == nil { + t.Fatal("expected error: modeline below YAML must be rejected") + } + if cerrors.Is(err, ErrModelineNotFound) { + t.Errorf("misplaced modeline must NOT be classified as orphan (ErrModelineNotFound); got: %v", err) + } + if !strings.Contains(err.Error(), "modeline found below non-comment content") { + t.Errorf("error must name the misplacement so operator can fix it; got: %v", err) + } +} + +// TestFindAndParseModeline_OrphanWithLeadingComments_ReturnsErrModelineNotFound +// pins the apply / upgrade side-patch dispatch: a file with +// `#`-prefixed comments above plain YAML, but NO `# talm:` line +// anywhere, is a legitimate orphan and must surface +// ErrModelineNotFound. The caller (apply.fileHasTalmModeline) +// routes orphan files into the direct-patch / side-patch path; +// any other error type collapses that dispatch into a misleading +// "parse error" surface. +func TestFindAndParseModeline_OrphanWithLeadingComments_ReturnsErrModelineNotFound(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "# Note A\n# Note B\nmachine:\n type: controlplane\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + _, _, err := FindAndParseModeline(file) + if err == nil { + t.Fatal("expected ErrModelineNotFound: orphan file with leading comments") + } + if !cerrors.Is(err, ErrModelineNotFound) { + t.Errorf("orphan file with leading comments must surface ErrModelineNotFound so callers route to side-patch path; got: %v", err) + } +} + +// TestFindAndParseModeline_IndentedTalmCommentInBody_NotMisclassified +// pins the false-positive guard: a YAML body containing an operator- +// authored `# talm: …` comment line (indented under a YAML key, e.g. +// " # talm: see docs for how this section interacts with the modeline") +// must NOT be classified as a misplaced modeline. The canonical +// modeline always sits at column 0; indented `# talm:` text inside +// the body is a regular YAML comment. Without this guard, legitimate +// node files with documentation comments would be rejected as +// "modeline found below non-comment content". +func TestFindAndParseModeline_IndentedTalmCommentInBody_NotMisclassified(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + body := "machine:\n" + + " type: controlplane\n" + + " # talm: see docs for how this section interacts with the modeline\n" + + " network:\n" + + " hostname: cp01\n" + if err := os.WriteFile(file, []byte(body), 0o644); err != nil { + t.Fatal(err) + } + + _, _, err := FindAndParseModeline(file) + if err == nil { + t.Fatal("expected ErrModelineNotFound: indented # talm: comment is not a modeline") + } + if !cerrors.Is(err, ErrModelineNotFound) { + t.Errorf("indented # talm: comment in body must surface as ErrModelineNotFound (orphan), not misplaced; got: %v", err) + } +} + +// TestFindAndParseModeline_OrphanPlainYAML_ReturnsErrModelineNotFound +// pins the simplest orphan shape: plain YAML with no comments and +// no modeline. Same contract as the leading-comments orphan — +// must return ErrModelineNotFound. +func TestFindAndParseModeline_OrphanPlainYAML_ReturnsErrModelineNotFound(t *testing.T) { + dir := t.TempDir() + file := filepath.Join(dir, "node.yaml") + if err := os.WriteFile(file, []byte("machine: {}\n"), 0o644); err != nil { + t.Fatal(err) + } + + _, _, err := FindAndParseModeline(file) + if err == nil { + t.Fatal("expected ErrModelineNotFound: plain-YAML orphan") + } + if !cerrors.Is(err, ErrModelineNotFound) { + t.Errorf("plain-YAML orphan must surface ErrModelineNotFound; got: %v", err) + } +} + +// TestFindAndParseModeline_MissingFile pins the file-IO error +// path: a non-existent file surfaces an error, not a panic. +func TestFindAndParseModeline_MissingFile(t *testing.T) { + _, _, err := FindAndParseModeline(filepath.Join(t.TempDir(), "nope.yaml")) + if err == nil { + t.Fatal("expected error for missing file") + } +} diff --git a/pkg/modeline/contract_test.go b/pkg/modeline/contract_test.go index cfe5a9a4..3f9884e9 100644 --- a/pkg/modeline/contract_test.go +++ b/pkg/modeline/contract_test.go @@ -26,8 +26,6 @@ package modeline import ( - "os" - "path/filepath" "reflect" "strings" "testing" @@ -131,75 +129,6 @@ func TestContract_ParseModeline_EmptyArrays(t *testing.T) { } } -// === ReadAndParseModeline === - -// Contract: ReadAndParseModeline opens a file, reads only the first -// line, and parses it as a modeline. Subsequent lines are not read -// (the modeline is conventionally the very first line, and the rest -// of the file is YAML the helm engine consumes). -func TestContract_ReadAndParseModeline_FirstLineOnly(t *testing.T) { - dir := t.TempDir() - file := filepath.Join(dir, "node.yaml") - content := `# talm: nodes=["1.2.3.4"] -# talm: nodes=["should-be-ignored"] -machine: - type: worker -` - err := os.WriteFile(file, []byte(content), 0o644) - if err != nil { - t.Fatal(err) - } - got, err := ReadAndParseModeline(file) - if err != nil { - t.Fatalf("unexpected error: %v", err) - } - if !reflect.DeepEqual(got.Nodes, []string{testNodeIP1}) { - t.Errorf("expected first line only, got %+v", got) - } -} - -// Contract: a missing file produces an error mentioning the path so -// the operator can fix it without grepping. -func TestContract_ReadAndParseModeline_MissingFile(t *testing.T) { - missing := filepath.Join(t.TempDir(), "nope.yaml") - _, err := ReadAndParseModeline(missing) - if err == nil { - t.Fatal("expected error for missing file") - } -} - -// Contract: an empty file produces a precise error. -func TestContract_ReadAndParseModeline_EmptyFile(t *testing.T) { - dir := t.TempDir() - file := filepath.Join(dir, "empty.yaml") - err := os.WriteFile(file, []byte(""), 0o644) - if err != nil { - t.Fatal(err) - } - _, err = ReadAndParseModeline(file) - if err == nil { - t.Fatal("expected error for empty file") - } - if !strings.Contains(err.Error(), "empty") { - t.Errorf("error must mention 'empty', got: %v", err) - } -} - -// Contract: a file whose first line is not a modeline surfaces the -// parse error verbatim — the file-read path is a thin wrapper. -func TestContract_ReadAndParseModeline_NonModelineFirstLine(t *testing.T) { - dir := t.TempDir() - file := filepath.Join(dir, "no-modeline.yaml") - err := os.WriteFile(file, []byte("machine:\n"), 0o644) - if err != nil { - t.Fatal(err) - } - _, err = ReadAndParseModeline(file) - if err == nil { - t.Fatal("expected error for first line without modeline") - } -} - // === GenerateModeline === // Contract: GenerateModeline emits a line that ParseModeline accepts diff --git a/pkg/modeline/modeline.go b/pkg/modeline/modeline.go index 88621abb..2fd68c0d 100644 --- a/pkg/modeline/modeline.go +++ b/pkg/modeline/modeline.go @@ -17,6 +17,15 @@ type Config struct { Templates []string } +// ErrModelineNotFound is the sentinel cause FindAndParseModeline +// returns (wrapped with a hint) when the input file has no +// `# talm: …` line at all. Distinct from "found but malformed": +// callers route the not-found case onto a different path (e.g. +// direct-patch in apply, "this isn't a modelined node file" in +// completion) while malformed-modeline errors bubble up so the +// operator sees their typo. Match with errors.Is. +var ErrModelineNotFound = errors.New("modeline not found") + // ParseModeline parses a modeline string and populates the Config structure. func ParseModeline(line string) (*Config, error) { config := &Config{} @@ -72,38 +81,114 @@ func ParseModeline(line string) (*Config, error) { ) } -// ReadAndParseModeline reads the first line from a file and parses the modeline. -func ReadAndParseModeline(filePath string) (*Config, error) { +// FindAndParseModeline scans a file for the talm modeline, allowing +// operator-authored comment lines (`^#`) and blank lines as a leading +// prefix. The first non-comment, non-blank line must be the modeline +// itself (`# talm: …`); arbitrary YAML or prose before the modeline +// is rejected. +// +// Returns the leading comment / blank lines verbatim (without trailing +// `\n`), the parsed Config, and any error. `talm template -I` uses +// the leading-lines return to preserve operator documentation when +// the in-place rewrite overwrites the file. Every other talm +// workflow that consumes node files (apply, upgrade, completion, +// wrapped talosctl commands) calls this function too so the +// file-shape contract is uniform across the surface. +func FindAndParseModeline(filePath string) ([]string, *Config, error) { file, err := os.Open(filePath) if err != nil { //nolint:wrapcheck // cockroachdb/errors.WithHintf is the project's wrapping/hinting idiom - return nil, errors.WithHintf( + return nil, nil, errors.WithHintf( errors.Wrap(err, "error opening config file"), "check that %s exists and is readable", filePath, ) } defer func() { _ = file.Close() }() + var leading []string + scanner := bufio.NewScanner(file) - if scanner.Scan() { - firstLine := scanner.Text() + for scanner.Scan() { + line := scanner.Text() + trim := strings.TrimSpace(line) + + // Blank line or a non-modeline comment: keep collecting. + if trim == "" || (strings.HasPrefix(trim, "#") && !strings.HasPrefix(trim, "# talm:")) { + leading = append(leading, line) + + continue + } + + // Modeline candidate. + if strings.HasPrefix(trim, "# talm:") { + config, parseErr := ParseModeline(line) + if parseErr != nil { + // ParseModeline already wraps + WithHint at its boundary. + return nil, nil, parseErr + } + + return leading, config, nil + } - return ParseModeline(firstLine) + // First non-comment, non-blank line is not a modeline. + // Distinguish orphan (no modeline anywhere) from misplaced + // modeline (a `# talm:` line lives below YAML) via lookahead. + return nil, nil, classifyNoLeadingModeline(scanner, line) } - err = scanner.Err() - if err != nil { + if scanErr := scanner.Err(); scanErr != nil { //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom - return nil, errors.WithHint( - errors.Wrap(err, "error reading first line of config file"), + return nil, nil, errors.WithHint( + errors.Wrap(scanErr, "error reading config file"), "file may be truncated or unreadable", ) } //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom - return nil, errors.WithHint( - errors.New("config file is empty"), - "per-node values file must start with a modeline like '# talm: nodes=[...]'", + return nil, nil, errors.WithHint( + ErrModelineNotFound, + "per-node values file must contain a `# talm: nodes=[…]` modeline; comments and blanks may precede it but at least one modeline must be present", + ) +} + +// classifyNoLeadingModeline picks between the "orphan" and +// "misplaced modeline" outcomes when FindAndParseModeline has hit +// a non-comment line without first seeing a `# talm:` modeline. +// Scans the rest of the file: a later `# talm:` line means the +// operator put the modeline below YAML (rejected with a clear +// hint); no `# talm:` anywhere means the file is a legitimate +// orphan (returned as ErrModelineNotFound so callers can route to +// the side-patch / direct-patch path). +func classifyNoLeadingModeline(scanner *bufio.Scanner, firstNonComment string) error { + for scanner.Scan() { + // Column-0 prefix only — the canonical modeline always + // sits at the very start of the line. Indented `# talm:` + // text inside a YAML body is an operator-authored + // comment (e.g. "# talm: see the modeline above for + // nodes/templates wiring"), NOT a misplaced modeline. + // TrimSpace-then-HasPrefix would false-positive on those + // and block legitimate node files. + if strings.HasPrefix(scanner.Text(), "# talm:") { + //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom + return errors.WithHint( + errors.Newf("modeline found below non-comment content: first non-comment line was %q", firstNonComment), + "the `# talm: …` modeline must precede any YAML content; only `#`-prefixed comments and blank lines may sit above it", + ) + } + } + + if scanErr := scanner.Err(); scanErr != nil { + //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom + return errors.WithHint( + errors.Wrap(scanErr, "error reading config file"), + "file may be truncated or unreadable", + ) + } + + //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom + return errors.WithHint( + ErrModelineNotFound, + "per-node values file must contain a `# talm: nodes=[…]` modeline; comments and blanks may precede it but at least one modeline must be present", ) }