diff --git a/README.md b/README.md index a0a656aa..b82e1381 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,17 @@ talm init -p cozystack -N myawesomecluster --image factory.talos.dev/installer/< `--image` rewrites the top-level `image:` field in the preset's `values.yaml` before write. The flag is honored on initial `init` only — for an existing project, edit `values.yaml` directly. The `cozystack` preset declares `image:`; the `generic` preset does not, so `--image --preset generic` is rejected up front. -Edit `values.yaml` to set your cluster's control-plane endpoint. This is the URL every node's kubelet and kube-proxy will dial. The chart leaves it empty on purpose so a missed override fails loudly instead of silently embedding a placeholder. +To set the Kubernetes control-plane URL at init time, pass `--cluster-endpoint`: + +```bash +talm init -p cozystack -N myawesomecluster --cluster-endpoint https://vip.example.test:6443 +``` + +`--cluster-endpoint` writes the URL into `values.yaml::endpoint`, which the chart renders into `cluster.controlPlane.endpoint` of every node's MachineConfig (the URL kubelet and kube-proxy dial). The flag is honored on initial `init` only — for an existing project, edit `values.yaml` directly. + +`--endpoints` and `--cluster-endpoint` address different concepts: `--endpoints` (plural, list) populates the `talosconfig` context for the talosctl client; `--cluster-endpoint` (singular, full URL) populates the Kubernetes control-plane address inside the chart. When `--endpoints` is given a single value, init auto-derives `values.yaml::endpoint` as `https://:6443` — the single-target case is unambiguous. Multi-endpoint inputs never auto-derive (picking one node would silently couple cluster availability to it); the operator must pass `--cluster-endpoint` explicitly or fill `values.yaml::endpoint` later. The init flow prints a hint at the end when the field is left empty. + +Edit `values.yaml` to set your cluster's control-plane endpoint if neither flag set it. This is the URL every node's kubelet and kube-proxy will dial. The chart leaves it empty by default so a missed override fails loudly instead of silently embedding a placeholder. Endpoint / floatingIP combinations: diff --git a/docs/manual-test-plan.md b/docs/manual-test-plan.md index 81621dee..b8d1ba22 100644 --- a/docs/manual-test-plan.md +++ b/docs/manual-test-plan.md @@ -98,6 +98,65 @@ Expected: clear error mentioning the missing key path. rm -rf /tmp/talm-init-test ``` +### A6a. `init --decrypt` without `talm.key` surfaces recovery hint + +```bash +mkdir -p /tmp/talm-decrypt-test && cd /tmp/talm-decrypt-test +/tmp/talm-safety init --preset cozystack --name test --endpoints 192.0.2.1 +mv talm.key /tmp/talm.key.backup +/tmp/talm-safety init --decrypt +mv /tmp/talm.key.backup talm.key # restore for next run +rm -rf /tmp/talm-decrypt-test +``` + +Expected: error `failed to decrypt secrets: load key: read key file: open /talm.key: no such file or directory` followed by hint `talm.key is required to decrypt secrets.encrypted.yaml. Restore your backed-up key, or re-run \`talm init\` to regenerate (this writes new secrets — the old secrets.encrypted.yaml will not be decryptable without the original key).` + +Regression anchor: the hint must name BOTH recovery paths (restore from backup, re-run init to regenerate) AND include the warning that regeneration writes new secrets making the old encrypted secrets undecryptable. A regression that drops either path or the warning silently invites operators to "just run init again" without understanding the data-loss tradeoff. + +### A9. `init --cluster-endpoint` populates values.yaml::endpoint + +```bash +mkdir -p /tmp/talm-cluster-ep-test && cd /tmp/talm-cluster-ep-test +/tmp/talm-safety init --preset cozystack --name test \ + --endpoints 10.0.0.1,10.0.0.2,10.0.0.3 \ + --cluster-endpoint https://vip.example.test:6443 +grep "^endpoint:" values.yaml +grep -A 3 "endpoints:" talosconfig | head -5 +rm -rf /tmp/talm-cluster-ep-test +``` + +Expected: `values.yaml` has `endpoint: "https://vip.example.test:6443"` (operator's VIP, explicit), `talosconfig` has all three `10.0.0.1`/`10.0.0.2`/`10.0.0.3` under the context's endpoints array. The two flags address different concepts — `--cluster-endpoint` is the kube-apiserver URL, `--endpoints` is the talosctl-client list — and both round-trip independently. + +Regression anchor: removing `--cluster-endpoint` and passing only `--endpoints 10.0.0.1,10.0.0.2,10.0.0.3` MUST leave `values.yaml::endpoint` empty (the multi-endpoint case never auto-derives — picking one node would silently couple cluster availability to it). The init flow MUST then print a hint at the end pointing the operator at `values.yaml::endpoint` with examples for VIP / LB shapes. + +### A10. `init --endpoints` with single entry auto-derives values.yaml::endpoint + +```bash +mkdir -p /tmp/talm-single-ep-test && cd /tmp/talm-single-ep-test +/tmp/talm-safety init --preset cozystack --name test --endpoints 192.0.2.10 +grep "^endpoint:" values.yaml +rm -rf /tmp/talm-single-ep-test +``` + +Expected: `endpoint: "https://192.0.2.10:6443"` — the single-endpoint case is unambiguously "this is also the cluster URL", so init derives the canonical `https://:6443` form. No hint printed at end of init for this case. + +Regression anchor: this auto-derive ONLY fires when `len(--endpoints) == 1`. Multi-endpoint inputs MUST leave the field empty (see A9). + +### A11. `init --cluster-endpoint` rejects malformed URL before any files land on disk + +```bash +mkdir -p /tmp/talm-bad-ep-test && cd /tmp/talm-bad-ep-test +/tmp/talm-safety init --preset cozystack --name test \ + --endpoints 192.0.2.10 \ + --cluster-endpoint "not-a-url" +ls -la # should be empty +rm -rf /tmp/talm-bad-ep-test +``` + +Expected: error `cluster-endpoint "not-a-url" is missing scheme, host, or port` with hint pointing at the canonical form. Directory remains empty — NO `talosconfig`, NO `.gitignore`, NO `secrets.yaml` written. Validation happens in PreRunE before any file writes so a malformed flag never produces a half-initialised project tree. + +Regression anchor: a regression that defers validation to RunE (i.e. after the secret-bundle generation) would leave `talosconfig` and `.gitignore` behind — verify by checking `ls -la` after the failing command finds an empty directory. + ### A8. `init --update` without `--preset` and without preset dep in Chart.yaml ```bash diff --git a/pkg/age/age.go b/pkg/age/age.go index 590f441c..de6f8088 100644 --- a/pkg/age/age.go +++ b/pkg/age/age.go @@ -104,6 +104,20 @@ func LoadKey(rootDir string) (*age.X25519Identity, error) { keyData, err := os.ReadFile(keyFile) if err != nil { + if errors.Is(err, os.ErrNotExist) { + // Talm.key absence is a real operational scenario + // (laptop swap, lost backup, fresh checkout) and the + // raw "open ...: no such file" surfaces as if it's an + // internal bug. Attach the recovery hint naming both + // paths the operator can take so the error reads as + // "what to do next", not "talm is broken". + //nolint:wrapcheck // cockroachdb/errors.WithHint is the project's wrapping/hinting idiom at boundaries. + return nil, errors.WithHint( + errors.Wrap(err, "read key file"), + "talm.key is required to decrypt secrets.encrypted.yaml. Restore your backed-up key, or re-run `talm init` to regenerate (this writes new secrets — the old secrets.encrypted.yaml will not be decryptable without the original key).", + ) + } + return nil, errors.Wrap(err, "read key file") } diff --git a/pkg/age/contract_test.go b/pkg/age/contract_test.go index 368aa1f7..b7783d96 100644 --- a/pkg/age/contract_test.go +++ b/pkg/age/contract_test.go @@ -36,6 +36,7 @@ import ( "strings" "testing" + cerrors "github.com/cockroachdb/errors" "github.com/cozystack/talm/pkg/age" "gopkg.in/yaml.v3" ) @@ -181,6 +182,31 @@ func TestContract_Age_LoadKey_MissingFileErrors(t *testing.T) { } } +// TestContract_Age_LoadKey_MissingFileSurfacesRecoveryHint pins the +// operator-facing recovery message attached to a missing talm.key. +// Without the hint, operators see a raw "open talm.key: no such +// file" stack-style line and assume it's a bug in talm. The hint +// must name BOTH recovery paths: restore from backup, or re-run +// `talm init` to regenerate (which writes new secrets — old +// secrets.encrypted.yaml becomes undecryptable without the +// original key). +func TestContract_Age_LoadKey_MissingFileSurfacesRecoveryHint(t *testing.T) { + dir := t.TempDir() // no talm.key inside + + _, err := age.LoadKey(dir) + if err == nil { + t.Fatal("expected error for missing talm.key") + } + + hints := cerrors.GetAllHints(err) + joinedLower := strings.ToLower(strings.Join(hints, "\n")) + for _, want := range []string{"talm.key", "restore", "talm init"} { + if !strings.Contains(joinedLower, want) { + t.Errorf("missing-key hint must mention %q so operator knows the recovery path; got hints:\n%s", want, strings.Join(hints, "\n")) + } + } +} + // === GetPublicKey / GetPublicKeyFromFile === // Contract: GetPublicKey returns the recipient string from an diff --git a/pkg/commands/init.go b/pkg/commands/init.go index cc391e63..f0ccc8de 100644 --- a/pkg/commands/init.go +++ b/pkg/commands/init.go @@ -19,6 +19,8 @@ import ( "bytes" "fmt" "io" + "net" + "net/url" "os" "path/filepath" "regexp" @@ -114,14 +116,15 @@ func resolveTalosconfigEndpoints(globalEndpoints []string) []string { // //nolint:gochecknoglobals // cobra flag binding requires a stable address var initCmdFlags struct { - force bool - preset string - name string - talosVersion string - image string - update bool - encrypt bool - decrypt bool + force bool + preset string + name string + talosVersion string + image string + clusterEndpoint string + update bool + encrypt bool + decrypt bool } // initCmd represents the `init` command. @@ -148,6 +151,30 @@ var initCmd = &cobra.Command{ ) } + // Same constraint applies to --cluster-endpoint: it rewrites + // values.yaml::endpoint at write time on initial init only. + // On --encrypt / --decrypt / --update the flag would have no + // effect, so surface the mismatch loudly instead of silently + // discarding the operator's intent. + if initCmdFlags.clusterEndpoint != "" && (initCmdFlags.encrypt || initCmdFlags.decrypt || initCmdFlags.update) { + return errors.WithHint( + errors.New("--cluster-endpoint is honored on initial init only; not valid with --encrypt, --decrypt, or --update"), + "drop --cluster-endpoint and edit values.yaml manually if you need to change the control-plane URL on an existing project, or re-run init without --encrypt/--decrypt/--update", + ) + } + + // Validate the URL shape of --cluster-endpoint up front so a + // malformed value short-circuits before any files are + // written. The same validation runs again at the + // resolveClusterEndpoint call site in RunE as defense in + // depth, but the early check keeps a half-initialised + // project tree off disk when the operator typos the flag. + if initCmdFlags.clusterEndpoint != "" { + if err := validateClusterEndpoint(initCmdFlags.clusterEndpoint); err != nil { + return err + } + } + // For -e, -d, and -u flags, always check that we're in a project root if initCmdFlags.encrypt || initCmdFlags.decrypt || initCmdFlags.update { // Verify that Config.RootDir is actually a project root @@ -699,6 +726,15 @@ var initCmd = &cobra.Command{ return err } + // Resolve the Kubernetes control-plane URL up front. Failing + // here (malformed --cluster-endpoint) short-circuits before + // any files are written so the project tree never lands in + // a half-initialised state. + clusterEndpoint, err := resolveClusterEndpoint(initCmdFlags.clusterEndpoint, GlobalArgs.Endpoints) + if err != nil { + return err + } + for path, content := range presetFiles { parts := strings.SplitN(path, "/", 2) chartName := parts[0] @@ -716,6 +752,8 @@ var initCmd = &cobra.Command{ return err } + rendered = applyEndpointOverride(rendered, clusterEndpoint) + err = writeToDestination(rendered, file, presetFileMode) default: err = writeToDestination([]byte(content), file, presetFileMode) @@ -745,6 +783,15 @@ var initCmd = &cobra.Command{ printSecretsWarning() } + // Late hint when the operator left the cluster-control-plane + // URL unset: walk the rendered values.yaml on disk and check + // `endpoint:` is still an empty string. The check reads the + // written file rather than relying on resolveClusterEndpoint's + // return because the preset itself may have shipped a non- + // empty default (which we'd leave alone). This way the hint + // only fires when the operator genuinely has more work to do. + printClusterEndpointHintIfEmpty(filepath.Join(Config.RootDir, valuesYamlName)) + return nil }, } @@ -794,6 +841,111 @@ func readChartYamlPreset() (string, error) { ) } +// endpointLineRe matches the top-level `endpoint:` line in a +// preset values.yaml regardless of YAML serialization style — +// double-quoted, single-quoted, unquoted, with or without a +// trailing comment. Same shape as imageLineRe so that +// applyEndpointOverride round-trips identically across presets +// the chart authors write in different styles. +var endpointLineRe = regexp.MustCompile(`(?m)^endpoint:(\s|$).*$`) + +// resolveClusterEndpoint picks the value to embed in +// values.yaml::endpoint, applying precedence from operator- +// explicit to inferred: +// +// 1. --cluster-endpoint flag wins (validated as URL with +// scheme, host, port). +// 2. --endpoints with exactly one entry auto-derives +// `https://:6443`. The single-target case is +// unambiguous; picking one of N multi-control-plane +// endpoints would silently couple cluster availability to +// that node, so multi-endpoint inputs return empty here. +// 3. Empty otherwise — the init flow prints a hint at the end +// pointing the operator at values.yaml::endpoint. +// +// The chart leaves endpoint empty by default so a missed +// override fails loudly rather than silently embedding a +// placeholder. The single-endpoint auto-derive does not violate +// this intent: the operator stated their endpoint explicitly +// via --endpoints, the derived value is the canonical +// `https://:6443` form, and a missed flag still produces +// an empty endpoint + the late hint. +func resolveClusterEndpoint(clusterEndpointFlag string, endpoints []string) (string, error) { + if clusterEndpointFlag != "" { + if err := validateClusterEndpoint(clusterEndpointFlag); err != nil { + return "", err + } + + return clusterEndpointFlag, nil + } + + if len(endpoints) == 1 { + return "https://" + net.JoinHostPort(endpoints[0], "6443"), nil + } + + return "", nil +} + +// validateClusterEndpoint requires the URL to have an explicit +// scheme, host, and port — the three pieces kube-apiserver +// clients need to dial the control plane. Looser shapes (bare +// host, missing scheme, missing port) tend to be operator +// typos that surface much later as unreachable-control-plane +// errors deep in apply / upgrade flows; catching them at the +// init boundary keeps the failure mode close to the cause. +func validateClusterEndpoint(endpoint string) error { + parsed, err := url.Parse(endpoint) + if err != nil { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return errors.WithHint( + errors.Wrapf(err, "parsing cluster-endpoint %q", endpoint), + "--cluster-endpoint must be a full URL with scheme, host, and port (e.g. `https://10.0.0.1:6443` or `https://vip.example.test:6443`)", + ) + } + + if parsed.Scheme == "" || parsed.Host == "" || parsed.Port() == "" { + //nolint:wrapcheck // cockroachdb/errors.WithHint at boundary. + return errors.WithHint( + errors.Newf("cluster-endpoint %q is missing scheme, host, or port", endpoint), + "--cluster-endpoint must be a full URL with scheme, host, and port (e.g. `https://10.0.0.1:6443` or `https://vip.example.test:6443`)", + ) + } + + return nil +} + +// applyEndpointOverride returns values with the top-level +// `endpoint:` line replaced so it points at override. An empty +// override returns values unchanged. Unlike applyImageOverride, +// a preset that does not declare a top-level `endpoint:` line is +// NOT a hard error — the auto-derive path (single-endpoint +// implicit) shouldn't fail the init flow when the chart simply +// doesn't expose the field. Operator-explicit `--cluster-endpoint` +// on such a preset is validated up the call chain (the caller +// can choose to surface a warning); the helper itself stays +// best-effort. +// +// The override is %q-quoted (same as applyImageOverride) so +// special characters in the URL (rarely present in practice +// but possible in DNS names with quoted segments) are escaped. +// ReplaceAllFunc is used to defeat regex.ReplaceAll's +// $-expansion in the replacement. +func applyEndpointOverride(values []byte, override string) []byte { + if override == "" { + return values + } + + if !endpointLineRe.Match(values) { + return values + } + + replacement := fmt.Appendf(nil, "endpoint: %q", override) + + return endpointLineRe.ReplaceAllFunc(values, func([]byte) []byte { + return replacement + }) +} + // imageLineRe matches the top-level `image:` line in a preset // values.yaml regardless of YAML serialization style — double-quoted, // single-quoted, unquoted, with or without a trailing comment. @@ -1273,6 +1425,7 @@ func init() { initCmd.Flags().StringVarP(&initCmdFlags.preset, "preset", "p", "", "preset for file generation (not required with --encrypt, --decrypt, or --update)") initCmd.Flags().StringVarP(&initCmdFlags.name, "name", "N", "", "cluster name (not required with --encrypt, --decrypt, or --update)") initCmd.Flags().StringVar(&initCmdFlags.image, "image", "", "override the Talos installer image written to the preset's values.yaml (e.g. factory.talos.dev/installer/:)") + initCmd.Flags().StringVar(&initCmdFlags.clusterEndpoint, "cluster-endpoint", "", "Kubernetes control-plane URL written to values.yaml::endpoint (e.g. https://10.0.0.1:6443 or https://vip.example.test:6443). Takes precedence over the single-endpoint auto-derive heuristic; required for multi-control-plane setups where the operator picks a VIP or load balancer.") initCmd.Flags().BoolVar(&initCmdFlags.force, "force", false, "overwrite existing files; on --update also auto-accepts every preset-template diff without the interactive prompt") initCmd.Flags().BoolVarP(&initCmdFlags.update, "update", "u", false, "update Talm library chart") // Override persistent -e flag for init command to use for encrypt @@ -1430,6 +1583,41 @@ func fileExists(file string) bool { return err == nil } +// emptyEndpointLineRe matches a top-level `endpoint:` line whose +// value is the empty string in any common YAML serialization +// shape: `endpoint: ""`, `endpoint: ”`, `endpoint:` followed by +// nothing, optional trailing comment. Used by +// printClusterEndpointHintIfEmpty to decide whether the operator +// still needs to fill the field after init. +var emptyEndpointLineRe = regexp.MustCompile(`(?m)^endpoint:\s*(?:""|''|)\s*(?:#.*)?$`) + +// printClusterEndpointHintIfEmpty surfaces a one-line operator +// hint pointing at values.yaml::endpoint when init completes +// with that field left empty. Hits the common case where the +// operator passed `--endpoints` (which populates talosconfig) +// but didn't also tell talm the Kubernetes control-plane URL — +// both shared the word "endpoint" so the distinction is easy to +// miss. The hint is purely advisory: the chart is allowed to +// ship with an empty endpoint and the operator can fill it +// manually later. +// +// Best-effort: I/O failures here are silently ignored (the hint +// is a nudge, not a contract). Presets that don't declare a +// top-level endpoint: field skip the hint too — the heuristic +// only matches an explicit empty assignment. +func printClusterEndpointHintIfEmpty(valuesPath string) { + data, err := os.ReadFile(valuesPath) + if err != nil { + return + } + + if !emptyEndpointLineRe.Match(data) { + return + } + + fmt.Fprintf(os.Stderr, "\nNext: set values.yaml::endpoint to your cluster's control-plane URL (e.g. `https://:6443` for a VIP setup, or `https://:6443` for an external load balancer).\nThe chart leaves this empty by design — `--endpoints` populates talosconfig (talosctl client routing), not the kube-apiserver URL nodes dial. Use `--cluster-endpoint` on `talm init` to set both in one go.\n") +} + func printSecretsWarning() { keyFile := filepath.Join(Config.RootDir, talmKeyName) keyFileExists := fileExists(keyFile) diff --git a/pkg/commands/init_test.go b/pkg/commands/init_test.go index 72e63b3d..708cdc31 100644 --- a/pkg/commands/init_test.go +++ b/pkg/commands/init_test.go @@ -598,6 +598,242 @@ func TestResolveTalosconfigEndpoints_MultipleEndpoints(t *testing.T) { } } +// TestResolveClusterEndpoint pins the precedence rules for the +// cluster-control-plane URL that lands in values.yaml::endpoint: +// +// 1. --cluster-endpoint (explicit, takes precedence) +// 2. --endpoints with exactly one entry (auto-derive +// https://:6443 — the unambiguous single-target case) +// 3. empty (operator must fill values.yaml manually) +// +// Multi-endpoint --endpoints MUST NOT auto-populate: in +// multi-node-control-plane topologies the right value is a VIP / +// LB / DNS RR, NOT one node picked arbitrarily; picking would +// silently couple cluster availability to that node. +func TestResolveClusterEndpoint(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + clusterEndpoint string + endpoints []string + want string + wantErrContains string + }{ + { + name: "cluster-endpoint flag wins over single-endpoint derive", + clusterEndpoint: "https://vip.example.test:6443", + endpoints: []string{"192.0.2.10"}, + want: "https://vip.example.test:6443", + }, + { + name: "cluster-endpoint alone (no --endpoints)", + clusterEndpoint: "https://vip.example.test:6443", + endpoints: nil, + want: "https://vip.example.test:6443", + }, + { + name: "single endpoint auto-derives default-port URL", + clusterEndpoint: "", + endpoints: []string{"192.0.2.10"}, + want: "https://192.0.2.10:6443", + }, + { + name: "multi endpoint does NOT auto-derive (picking one is wrong)", + clusterEndpoint: "", + endpoints: []string{"192.0.2.10", "192.0.2.11", "192.0.2.12"}, + want: "", + }, + { + name: "no sources returns empty", + clusterEndpoint: "", + endpoints: nil, + want: "", + }, + { + name: "malformed cluster-endpoint rejected (no scheme)", + clusterEndpoint: "vip.example.test:6443", + endpoints: nil, + wantErrContains: "cluster-endpoint", + }, + { + name: "malformed cluster-endpoint rejected (no host)", + clusterEndpoint: "https://", + endpoints: nil, + wantErrContains: "cluster-endpoint", + }, + } + + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + + got, err := resolveClusterEndpoint(tc.clusterEndpoint, tc.endpoints) + if tc.wantErrContains != "" { + if err == nil { + t.Fatalf("expected error containing %q; got nil and value %q", tc.wantErrContains, got) + } + if !strings.Contains(err.Error(), tc.wantErrContains) { + t.Errorf("error must mention %q; got: %v", tc.wantErrContains, err) + } + + return + } + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if got != tc.want { + t.Errorf("got %q; want %q", got, tc.want) + } + }) + } +} + +// TestApplyEndpointOverride pins the values.yaml rewrite contract +// for the top-level `endpoint:` field: empty override leaves the +// file unchanged (single-source-of-truth: leave it for the +// operator), non-empty override replaces only that line and +// preserves surrounding content (other top-level keys, comments, +// trailing newlines). +func TestApplyEndpointOverride(t *testing.T) { + t.Parallel() + + original := []byte(`# Cluster endpoint. +endpoint: "" + +floatingIP: "" +vipLink: "" + +image: "ghcr.io/cozystack/cozystack/talos:v1.12.6" +podSubnets: +- 10.244.0.0/16 +`) + + t.Run("empty override returns input unchanged", func(t *testing.T) { + t.Parallel() + + got := applyEndpointOverride(original, "") + if !bytes.Equal(got, original) { + t.Errorf("empty override mutated the input:\n%s", got) + } + }) + + t.Run("non-empty override replaces only the endpoint line", func(t *testing.T) { + t.Parallel() + + got := applyEndpointOverride(original, "https://192.0.2.10:6443") + want := `endpoint: "https://192.0.2.10:6443"` + if !bytes.Contains(got, []byte(want)) { + t.Errorf("expected %q in output, got:\n%s", want, got) + } + + for _, marker := range []string{ + "# Cluster endpoint.", + "floatingIP: \"\"", + "vipLink: \"\"", + `image: "ghcr.io/cozystack/cozystack/talos:v1.12.6"`, + "podSubnets:", + "- 10.244.0.0/16", + } { + if !bytes.Contains(got, []byte(marker)) { + t.Errorf("override stripped surrounding content %q:\n%s", marker, got) + } + } + }) + + t.Run("values without endpoint field returns input unchanged", func(t *testing.T) { + t.Parallel() + + // Unlike --image, a non-existent endpoint: line is not an + // error — the auto-derive path is implicit and shouldn't + // fail the init flow. Operator-explicit + // --cluster-endpoint on a preset without the field is + // validated up-front in resolveClusterEndpoint's caller + // (init RunE), not here. + noEndpoint := []byte("image: foo\n") + got := applyEndpointOverride(noEndpoint, "https://192.0.2.10:6443") + if !bytes.Equal(got, noEndpoint) { + t.Errorf("override on values without endpoint: line must be a no-op; got:\n%s", got) + } + }) +} + +// TestPrintClusterEndpointHintIfEmpty pins the heuristic that +// decides whether to nudge the operator about +// values.yaml::endpoint. The hint MUST fire only when the field +// is left explicitly empty and MUST stay silent otherwise (file +// missing, preset without the field, operator already filled it). +// +// Parent is non-parallel: captureStderr swaps os.Stderr globally +// for the duration of fn, and concurrent subtests would race on +// the shared fd. The serial body is cheap (six tempdirs + +// six regex matches). +func TestPrintClusterEndpointHintIfEmpty(t *testing.T) { + const fileMissingSentinel = "<>" + + cases := []struct { + name string + fileContent string + wantHint bool + }{ + { + name: "explicit empty endpoint surfaces hint", + fileContent: "endpoint: \"\"\nimage: foo\n", + wantHint: true, + }, + { + name: "single-quoted empty endpoint surfaces hint", + fileContent: "endpoint: ''\nimage: foo\n", + wantHint: true, + }, + { + name: "bare-colon empty endpoint surfaces hint", + fileContent: "endpoint:\nimage: foo\n", + wantHint: true, + }, + { + name: "populated endpoint stays silent", + fileContent: "endpoint: \"https://10.0.0.1:6443\"\nimage: foo\n", + wantHint: false, + }, + { + name: "preset without endpoint field stays silent", + fileContent: "image: foo\n", + wantHint: false, + }, + { + name: "missing file stays silent (best-effort)", + fileContent: fileMissingSentinel, + wantHint: false, + }, + } + + for _, tc := range cases { + // captureStderr swaps os.Stderr globally for the duration of + // fn — running subtests in parallel would race on the shared + // fd. Keep these sequential while the parent still benefits + // from t.Parallel relative to other tests in the package. + t.Run(tc.name, func(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, valuesYamlName) + if tc.fileContent != fileMissingSentinel { + if err := os.WriteFile(path, []byte(tc.fileContent), 0o644); err != nil { + t.Fatal(err) + } + } + + got := captureStderr(t, func() { + printClusterEndpointHintIfEmpty(path) + }) + + gotHint := strings.Contains(got, "Next: set values.yaml::endpoint") + if gotHint != tc.wantHint { + t.Errorf("wantHint=%v gotHint=%v; stderr=%q", tc.wantHint, gotHint, got) + } + }) + } +} + func TestWriteSecretsBundleToFile_StillRefusesOverwrite(t *testing.T) { forceOrig := initCmdFlags.force rootOrig := Config.RootDir