feat(charts,modeline,upgrade): operator extension points and dev17 install fixes#210
feat(charts,modeline,upgrade): operator extension points and dev17 install fixes#210lexfrei wants to merge 3 commits into
Conversation
ParseModeline previously split the modeline body on the literal `, `
(comma+space) string, which is the separator GenerateModeline emits
between key-pairs. JSON arrays serialised by encoding/json carry no
space after each inner comma ("["a","b"]"), so the canonical
talm-generated form parsed back cleanly. But a hand-authored modeline
with whitespace inside the array ("nodes=["a", "b"]" — the
natural shape) was cut at the first inner comma. The leading half
("nodes=["a"") then failed JSON parsing with
"unexpected end of JSON input", with a hint that misled the operator
into thinking their JSON was malformed.
Replace the literal split with a depth-aware tokenizer that tracks
`[`/`]` nesting and JSON string state (including backslash
escapes) and only emits a key-pair boundary when it sees a `,` at
depth 0 outside a string literal. The canonical no-space form still
parses; the human-written form now parses; commas inside string
literals never split the value.
Update the contract test that pinned the old strict separator to
document the relaxation, and add cases for multi-element arrays with
various whitespace shapes plus comma-in-string-literal coverage.
Extend the manual-test-plan with section C6f covering the parser
behaviour on both anchor and side-patch slots.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…eneric presets
Operators wanting to extend the Talos machine config rendered by the
cozystack or generic presets had to fork the chart or hand-edit the
generated per-node YAML against the autogen banner asking them to
keep template edits in values. Add four "extra*" values keys that
plug into the load-bearing sections of machine.* — kernel modules,
sysctls, kubelet.extraConfig, machine.files — without overriding the
preset's hardcoded defaults.
Cozystack preset (append / merge to existing built-ins):
extraKernelModules [] append to openvswitch / drbd / zfs / spl
/ vfio_pci / vfio_iommu_type1
extraKubeletExtraArgs {} merge into cpuManagerPolicy + maxPods;
last-key-wins on conflict (escape hatch)
extraSysctls {} merge into gc_thresh* + nr_hugepages
extraMachineFiles [] append to CRI customization + lvm.conf
Generic preset (emit only when set):
extraKernelModules [] emits machine.kernel.modules block
extraKubeletExtraArgs {} emits kubelet.extraConfig block
extraSysctls {} emits machine.sysctls block
extraMachineFiles [] emits machine.files block
Contract tests under pkg/engine pin append vs merge semantics for
every key on cozystack, on / off state on generic, and the default
generic render still emits no machine.kernel / sysctls / files /
extraConfig blocks (the existing NoCozystackOpinionsOnGeneric guard
stays green). Manual-test-plan B7 / B8 cover both presets with
forward-looking yq queries against the rendered output.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…grade talm upgrade resolves the target installer image from values.yaml (canonical "raise the cluster's Talos version" workflow), but the node body's machine.install.image was never resynced afterwards. A follow-up talm apply re-rendered the chart against values.yaml and got the new image, then merged the stale body via MergeFileAsPatch and silently pinned the cluster back to the pre-upgrade image — the next A/B boot would roll back without warning. Point-patch machine.install.image in every -f node body after the upgrade RPC + post-upgrade verify return success. Uses a yaml.v3 node round-trip so the modeline, autogen banner, sibling keys, and per-key comments survive untouched; files without an install.image key (side-patches, orphans, partials) are silently skipped so the handler can blindly fan out over the full -f list. Verify failure intentionally leaves the body alone: the node is on the pre-upgrade image after auto-rollback, and the body must reflect what the node actually runs — not what talosctl was asked to install. Operator opt-out via --skip-post-upgrade-verify (or the --insecure / --stage paths that drop Phase 2C entirely) patches unconditionally on RPC success, trading the verify gate's safety for body-sync. Contract tests pin scalar swap, idempotency (no rewrite when already on target), silent-skip across three no-key shapes, structural-error surface for unexpected machine.install.image kinds, and file-list fan-out semantics. Manual-test-plan E4 covers the happy path, failure-path skip, and operator-opt-out path against a real cluster. Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces operator-supplied extension points for kernel modules, sysctls, kubelet arguments, and machine files across the cozystack and generic presets. It also implements a post-upgrade synchronization feature that updates the machine.install.image in node configuration files to prevent accidental reverts during future applies. Furthermore, the modeline parser has been enhanced to correctly handle commas within JSON arrays. Review feedback recommends refactoring the Helm template collision checks to use lists for improved maintainability and adopting the idiomatic errors.Is(err, io.EOF) check in the test suite.
| {{- range $k, $_ := .Values.extraKubeletExtraArgs }} | ||
| {{- if or (eq $k "cpuManagerPolicy") (eq $k "maxPods") }} | ||
| {{- fail (printf "values.yaml: extraKubeletExtraArgs.%s collides with the cozystack preset's built-in kubelet.extraConfig — keys never override (yaml.v3 rejects duplicate map keys on decode). Remove the entry from extraKubeletExtraArgs, or fork the chart preset if you need a different default." $k) }} | ||
| {{- end }} | ||
| {{- end }} |
There was a problem hiding this comment.
For better readability and maintainability, you could define a list of the built-in keys and use the has function to check for collisions. This avoids a long or chain and makes it easier to manage the list of keys.
{{- $builtInKubeletKeys := list "cpuManagerPolicy" "maxPods" -}}
{{- range $k, $_ := .Values.extraKubeletExtraArgs -}}
{{- if has $k $builtInKubeletKeys -}}
{{- fail (printf "values.yaml: extraKubeletExtraArgs.%s collides with the cozystack preset's built-in kubelet.extraConfig — keys never override (yaml.v3 rejects duplicate map keys on decode). Remove the entry from extraKubeletExtraArgs, or fork the chart preset if you need a different default." $k) -}}
{{- end -}}
{{- end -}}
| {{- range $k, $_ := .Values.extraSysctls }} | ||
| {{- if or (eq $k "vm.nr_hugepages") (eq $k "net.ipv4.neigh.default.gc_thresh1") (eq $k "net.ipv4.neigh.default.gc_thresh2") (eq $k "net.ipv4.neigh.default.gc_thresh3") }} | ||
| {{- fail (printf "values.yaml: extraSysctls.%s collides with the cozystack preset's built-in machine.sysctls — keys never override (yaml.v3 rejects duplicate map keys on decode). Remove the entry from extraSysctls, or fork the chart preset if you need a different default." $k) }} | ||
| {{- end }} | ||
| {{- end }} |
There was a problem hiding this comment.
Similar to the extraKubeletExtraArgs check, using a list for the built-in sysctl keys and checking with has would improve readability and make this easier to maintain.
{{- $builtInSysctls := list "vm.nr_hugepages" "net.ipv4.neigh.default.gc_thresh1" "net.ipv4.neigh.default.gc_thresh2" "net.ipv4.neigh.default.gc_thresh3" -}}
{{- range $k, $_ := .Values.extraSysctls -}}
{{- if has $k $builtInSysctls -}}
{{- fail (printf "values.yaml: extraSysctls.%s collides with the cozystack preset's built-in machine.sysctls — keys never override (yaml.v3 rejects duplicate map keys on decode). Remove the entry from extraSysctls, or fork the chart preset if you need a different default." $k) -}}
{{- end -}}
{{- end -}}
| if err.Error() == "EOF" { | ||
| break | ||
| } |
There was a problem hiding this comment.
Checking for end-of-file by comparing the error string err.Error() == "EOF" is brittle. The idiomatic and more robust way to check for io.EOF is to use errors.Is(err, io.EOF). You may need to import the io and errors packages.
| if err.Error() == "EOF" { | |
| break | |
| } | |
| if errors.Is(err, io.EOF) { | |
| break | |
| } |
|
Closing in favour of a follow-up PR with a cleaner branch name and description. |
Summary
Three independent improvements surfaced while installing Cozystack on a fresh 3-node Talos cluster (
dev17, OCI VM.Standard.E5.Flex, Talos v1.12.6 → v1.13.0). Each commit stands alone and can be reverted independently.fix(modeline): split modeline parts on top-level commas onlyThe
# talm: nodes=[...], endpoints=[...]parser usedstrings.SplitSeq(content, ", ")which cut the value at the first comma INSIDE a multi-element JSON array. Hand-authored shared side-patches withnodes=["a", "b"](space after comma) hiterror parsing JSON array for key nodes, value ["a". The talm-generated form happened to dodge it becausejson.Marshalof a string slice emits no space after the comma.Replace the literal split with a depth-aware tokenizer that splits on
,only at JSON-array depth 0, tracking string state and backslash escapes. Both the canonical no-space form and the human form parse cleanly. Comma inside string literals never splits. Empty arrays, trailing commas, and unbalanced brackets keep their existing semantics (test-pinned).feat(charts): expose Talos operator extension points in cozystack + generic presetsOperators wanting to extend the rendered Talos config (
machine.kernel.modules,machine.sysctls,machine.kubelet.extraConfig,machine.files) had to fork the chart or hand-edit the generatednodes/<n>.yamlagainst the autogen banner. Adds fourextra*values keys to both presets.On cozystack (defaults present): list-shaped keys append; map-shaped keys are collision-checked at template time and
failwith a precise hint if an operator key names a built-in. Built-ins are never overridable — yaml.v3 rejects duplicate map keys on decode, so a silent emit-both would produce a config that cannot round-trip.On generic (no defaults): each block emits only when the matching
extra*key is non-empty.fix(upgrade): point-patch node body install.image after successful upgradetalm upgraderesolves the target installer image fromvalues.yaml, but the per-nodenodes/<n>.yaml::machine.install.imagewas never resynced afterwards. A follow-uptalm applyrendered the chart againstvalues.yaml(got the new image), then merged the stale body viaMergeFileAsPatchand silently pinned the cluster back to the pre-upgrade image — the next A/B boot would roll back without warning.After successful RPC + post-upgrade verify, point-patch
machine.install.imagein every-fnode body to the applied image. Uses a yaml.v3 streaming decoder so the v1.12+ multi-doc shape (machine + cluster + RegistryMirrorConfig + LinkConfig + Layer2VIPConfig + VLANConfig) is preserved intact —yaml.Unmarshalreads only the first document and would silently erase the rest. Encoder pinned at 2-space indent so the diff stays a single line. Comments and modeline survive untouched.Verify-failure (auto-rollback) intentionally leaves the body alone: the node ran the old image, so the body must reflect that. Operator opt-out via
--skip-post-upgrade-verify(and--insecure/--stagepaths that drop Phase 2C entirely) patches unconditionally on RPC success.Verification
go test ./...: all packages green.golangci-lint run ./...: 0 issues.Docs sync
The new
extra*keys and the post-upgrade body-sync behaviour are operator-visible. A parallel update atcozystack/websitewill follow under a separate PR.