Skip to content

feat(init): --cluster-endpoint flag and talm.key recovery hint#206

Merged
lexfrei merged 2 commits into
mainfrom
feat/init-endpoint-and-key-hints
May 14, 2026
Merged

feat(init): --cluster-endpoint flag and talm.key recovery hint#206
lexfrei merged 2 commits into
mainfrom
feat/init-endpoint-and-key-hints

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented May 14, 2026

Closes the next init UX batch (#205 + #177). Both touch operator-facing pain points on init flow and are independent at the file level.

Commits

fix(init): hint at talm.key recovery path when --decrypt fails (#177)

talm.key absence (lost backup, fresh checkout, laptop swap) used to surface a raw open .../talm.key: no such file that reads like an internal bug. Distinguish the not-exist case from other I/O errors at the LoadKey boundary and attach a hint naming both recovery paths: restore the backed-up key, or re-run talm init to regenerate (with the warning that regeneration writes new secrets and the old secrets.encrypted.yaml becomes undecryptable). Pinned by TestContract_Age_LoadKey_MissingFileSurfacesRecoveryHint.

feat(init): --cluster-endpoint flag and single-endpoint auto-derive for values.yaml (#205)

Two distinct "endpoint" concepts in talm shared the same flag space and only one was addressed:

  • talosconfig.contexts.<name>.endpoints — talosctl-client round-robin list (populated by --endpoints)
  • values.yaml::endpoint — Kubernetes control-plane URL the chart renders into every node's MachineConfig

Adds a resolution chain that bridges them without breaking multi-CP scenarios where picking one node is wrong:

  1. --cluster-endpoint URL — explicit operator override, validated for scheme + host + port.
  2. --endpoints with exactly one entry — auto-derive https://<that>:6443.
  3. Empty — late hint at end of init nudges the operator at values.yaml::endpoint with VIP / LB examples.

Multi-endpoint --endpoints never auto-derives (would silently couple cluster availability to one arbitrary node). Validation runs in PreRunE so a malformed --cluster-endpoint short-circuits before any files land on disk. Pinned by TestResolveClusterEndpoint, TestApplyEndpointOverride, TestPrintClusterEndpointHintIfEmpty.

Test plan

docs/manual-test-plan.md extended:

  • A6ainit --decrypt without talm.key surfaces recovery hint
  • A9--cluster-endpoint populates values.yaml::endpoint
  • A10 — single --endpoints auto-derives
  • A11 — malformed --cluster-endpoint rejected before any files land on disk

README.md Getting Started section documents the two-endpoint distinction + the new flag.

Validation

  • go build ./... clean
  • go test ./... -count=1 all packages green
  • golangci-lint run ./... 0 issues
  • 5 dev17 smokes (single auto-derive, multi-no-derive + hint, cluster-endpoint precedence, malformed rejection, talm.key missing hint) all PASS

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced --cluster-endpoint flag to specify the Kubernetes control-plane endpoint during initialization.
  • Bug Fixes

    • Improved error messages when encryption key is missing, now providing clear recovery options (restore backup or regenerate keys).
  • Documentation

    • Enhanced initialization guide with control-plane endpoint configuration details.
    • Added manual test plan covering endpoint handling and key recovery scenarios.

Review Change Stack

lexfrei added 2 commits May 14, 2026 14:04
Talm.key absence is a real operational scenario (laptop swap, lost
backup, fresh checkout from a repo whose .gitignore properly
excludes the key) and the raw "open .../talm.key: no such file"
that LoadKey emitted surfaces as if it's an internal bug rather
than a "you need to restore your key" condition.

Distinguish the not-exist case from other I/O errors at the
LoadKey boundary and attach a hint naming both recovery paths
the operator can take: restore the backed-up key, or re-run
\`talm init\` to regenerate (with the warning that regeneration
writes new secrets and the old secrets.encrypted.yaml becomes
undecryptable without the original key).

Pinned by TestContract_Age_LoadKey_MissingFileSurfacesRecoveryHint.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…or values.yaml

Two distinct concepts share the word "endpoint" in talm and
the CLI flag --endpoints only addressed one of them:

  * talosconfig.contexts.<name>.endpoints — list of host[:port]
    for the talosctl-client round-robin. Populated by --endpoints.
  * values.yaml::endpoint — single URL (https://host:port) that
    the chart renders into cluster.controlPlane.endpoint of every
    node's MachineConfig. Used by kubelet / kube-proxy to dial
    the kube-apiserver.

Operators reasonably expected --endpoints to populate both;
discovering the split required reading the README. Add a
resolution chain that bridges the two without breaking
multi-endpoint scenarios where picking one node is wrong:

  1. --cluster-endpoint URL — explicit operator override,
     validated for scheme + host + port.
  2. --endpoints with exactly one entry — auto-derive
     https://<that>:6443 (unambiguous single-target case).
  3. Empty — late hint at end of init nudges the operator at
     values.yaml::endpoint with examples for VIP / LB cases.

Multi-endpoint --endpoints never auto-derives: in multi-cp
topologies the correct value is a VIP / LB / DNS round-robin,
not one arbitrarily-picked node IP whose loss would couple
cluster availability to that node.

Validation of --cluster-endpoint happens twice: once in
PreRunE so a malformed flag short-circuits before any files
land on disk, and again in resolveClusterEndpoint as defense in
depth for direct callers.

Pinned by:
  * TestResolveClusterEndpoint — precedence + multi-endpoint
    safety + malformed rejection across 7 sub-cases.
  * TestApplyEndpointOverride — values.yaml rewrite preserves
    surrounding content, no-op on presets without the field.
  * TestPrintClusterEndpointHintIfEmpty — hint fires on
    explicit empty / single-quoted empty / bare-colon empty;
    stays silent on populated / missing / no-field cases.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b575501f-c090-4f73-b6ce-df50c766341b

📥 Commits

Reviewing files that changed from the base of the PR and between a0301e7 and 31a27cd.

📒 Files selected for processing (6)
  • README.md
  • docs/manual-test-plan.md
  • pkg/age/age.go
  • pkg/age/contract_test.go
  • pkg/commands/init.go
  • pkg/commands/init_test.go

📝 Walkthrough

Walkthrough

The PR enhances talm init initialization with two complementary improvements: clearer error recovery when the encryption key is missing, and a new --cluster-endpoint flag to control Kubernetes control-plane URL configuration in values.yaml. All changes are documented and covered by manual tests.

Changes

Init Error Recovery and Cluster Endpoint Configuration

Layer / File(s) Summary
Missing encryption key recovery hint
pkg/age/age.go, pkg/age/contract_test.go, docs/manual-test-plan.md (A6a)
When talm.key is missing, LoadKey detects the absence and returns a hinted error explaining both recovery paths: restore from backup or re-run talm init to regenerate (with explicit warning that regeneration won't decrypt old secrets).
Cluster endpoint flag and resolution logic
pkg/commands/init.go (imports, flag struct, PreRunE validation, helper functions, flag registration), pkg/commands/init_test.go (TestResolveClusterEndpoint)
Adds --cluster-endpoint flag validated early in PreRunE to fail before any files are written. Implements precedence: explicit flag → single-endpoint auto-derive (https://<host>:6443) → empty. Malformed URLs are rejected with guidance.
Endpoint override and empty-endpoint hinting
pkg/commands/init.go (resolution call, override application, hint printing call/logic), pkg/commands/init_test.go (TestApplyEndpointOverride, TestPrintClusterEndpointHintIfEmpty)
Applies the resolved endpoint to values.yaml::endpoint: by rewriting the YAML line; after file write, detects explicitly empty endpoints and prints a hint guiding the operator to set it, noting that --endpoints only populates talosconfig routing.
Documentation and manual tests
README.md, docs/manual-test-plan.md (A9, A10, A11)
README documents --cluster-endpoint usage, examples, and differences from --endpoints. Manual tests verify successful endpoint override (A9), single-endpoint auto-derive (A10), malformed URL rejection (A11), and missing-key recovery (A6a).

Sequence Diagram

sequenceDiagram
  participant Operator
  participant Init as talm init
  participant Resolve as resolveClusterEndpoint
  participant Apply as applyEndpointOverride
  participant Hint as printClusterEndpointHintIfEmpty
  
  Operator->>Init: init --cluster-endpoint https://api.example.com:6443
  Init->>Init: PreRunE validate URL format early
  Init->>Resolve: resolve endpoint (flag/derive/empty)
  Resolve-->>Init: resolved endpoint or error
  Init->>Apply: rewrite values.yaml endpoint: line
  Apply-->>Init: updated bytes
  Init->>Hint: check values.yaml for empty endpoint
  Hint-->>Operator: (silent if populated, hint if empty)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #205: This PR implements the exact feature proposed in the issue: adding --cluster-endpoint flag, single-endpoint auto-derive logic, endpoint validation in PreRunE, and comprehensive manual test coverage.

Poem

🐰 A key once lost finds a path to restore,
While endpoints now flow through a well-guarded door,
With hints for the lost and with flags freshly clear,
The cluster's wide open for all to draw near! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/init-endpoint-and-key-hints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the --cluster-endpoint flag to the init command, allowing users to explicitly set the Kubernetes control-plane URL in values.yaml. It includes logic to auto-derive this URL from a single entry in --endpoints while providing a hint for multi-endpoint configurations. Additionally, the changes improve error reporting by adding actionable recovery hints when talm.key is missing. A review comment correctly identified a bug in the auto-derivation logic where endpoints already containing a port would result in malformed URLs.

Comment thread pkg/commands/init.go
Comment on lines +882 to +884
if len(endpoints) == 1 {
return "https://" + net.JoinHostPort(endpoints[0], "6443"), nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The auto-derivation logic for the cluster endpoint assumes that endpoints[0] is a bare host. However, in Talos, endpoints are frequently specified as host:port (e.g., 1.2.3.4:50000). If a port is present, net.JoinHostPort will append :6443 to the existing host:port string, resulting in an invalid URL like https://1.2.3.4:50000:6443. We should extract the host part before joining it with the Kubernetes API server port.

	if len(endpoints) == 1 {
		host := endpoints[0]
		if h, _, err := net.SplitHostPort(host); err == nil {
			host = h
		}
		return "https://" + net.JoinHostPort(host, "6443"), nil
	}

@lexfrei lexfrei marked this pull request as ready for review May 14, 2026 11:54
@lexfrei lexfrei added kind/feature Categorizes issue or PR as related to a new feature area/init Issues or PRs related to talm init (project bootstrap, encrypt/decrypt, --update flow) area/age Issues or PRs related to pkg/age (key generation, encrypt/decrypt, rotation) labels May 14, 2026
@lexfrei lexfrei merged commit fa94120 into main May 14, 2026
8 checks passed
@lexfrei lexfrei deleted the feat/init-endpoint-and-key-hints branch May 14, 2026 11:54
@lexfrei lexfrei self-assigned this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/age Issues or PRs related to pkg/age (key generation, encrypt/decrypt, rotation) area/init Issues or PRs related to talm init (project bootstrap, encrypt/decrypt, --update flow) kind/feature Categorizes issue or PR as related to a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant