Skip to content

feat(tests/integration): Phase 0 harness + smoke (envtest + linstor CLI)#5

Closed
kvaps wants to merge 1 commit into
mainfrom
feat/integration-harness-phase0
Closed

feat(tests/integration): Phase 0 harness + smoke (envtest + linstor CLI)#5
kvaps wants to merge 1 commit into
mainfrom
feat/integration-harness-phase0

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps kvaps commented May 14, 2026

Summary

Build the Tier 2 integration-test scaffold per docs/test-strategy.md so the 12 Phase 1 group agents have a working harness to extend.

The scaffold boots envtest + the full controller-runtime manager + the LINSTOR REST server on a free 127.0.0.1 port, drives it with the upstream linstor Python CLI (--machine-readable), and ships an in-process satellite mock that writes the healthy steady-state Status fields (Node Ready, StoragePool FreeCapacity, Resource UpToDate) without ever shelling out. Every file under tests/integration/ is guarded by //go:build integration so the default go test ./... flow is untouched.

What's in this PR

tests/integration/

  • harness/envtest.go — boots envtest.Environment{} with the CRDs from config/crd/bases/; reads KUBEBUILDER_ASSETS and fails with an actionable message if unset
  • harness/manager.go — wires every reconciler cmd/controller/main.go registers, plus the REST server on a free port, plus the satellite mock, and waits for /v1/healthz before returning
  • harness/satellite.go — in-process steady-state simulator. Knobs: SimulatePoolMissing(node, sp), SimulateDRBDState(rd, node, state), FailNext(cmd)
  • harness/linstor.goexec.Command(\"linstor\", \"--controllers\", URL, \"--machine-readable\", args...) with python-traceback / xml.etree / HTTPConnectionPool guards (Bug-59 class). JSON() flattens the [[obj…]] envelope the CLI emits
  • harness/csi.go — exposes pkg/csi-driver.Driver plus the underlying lapi.Client against the in-process REST URL. See caveat test(wave2): close 5.W03 — DRBD --ping-timeout net-option #1 below
  • harness/fixtures.goSeedThreeNodeCluster creates 3 Nodes (worker-1/2/3), 9 StoragePools (lvm-thin/zfs-thin/file × 3), default RG. Idempotent
  • harness/asserts.goEventually, generic MustList[L, T], generic MustGet[T], WaitForDRBDState
  • harness/concurrent.goRunParallel(t, n, fn) with panic propagation to t.Fatal
  • smoke_test.go — single test TestSmokeNodeList. Validates the whole stack in one round-trip

Reconcilers wired into the manager

In strict lockstep with cmd/controller/main.go:

  1. NodeReconciler
  2. NodeHeartbeatReconciler
  3. NodeLabelSyncReconciler
  4. StoragePoolReconciler
  5. ResourceGroupReconciler
  6. RGRebalanceReconciler
  7. ResourceDefinitionReconciler
  8. ResourceReconciler
  9. ResourceMigrationReconciler
  10. SnapshotReconciler
  11. AutoSnapshotRunnable
  12. AutoEvictReconciler

Plus the harness.Satellite mock (manager.Runnable) and pkg/rest.Server (manager.Runnable).

Validation

```
$ export KUBEBUILDER_ASSETS=$(setup-envtest use --print path 1.34.x)
$ go test -tags=integration -count=1 ./tests/integration/... -run '^TestSmokeNodeList' -v
=== RUN TestSmokeNodeList
--- PASS: TestSmokeNodeList (6.32s)
PASS
ok github.com/cozystack/blockstor/tests/integration 6.352s
? github.com/cozystack/blockstor/tests/integration/harness [no test files]
```

```
$ go build -tags=integration ./tests/integration/...

clean

$ golangci-lint run --build-tags=integration ./tests/integration/...
0 issues.

$ go vet ./...

clean

$ go test -count=1 ./tests/integration/...
go: warning: "./tests/integration/..." matched no packages
no packages to test # confirms: existing go test ./... is unaffected by the new files
```

Caveats / TODOs for Phase 1

  1. CSI is not gRPC yet. cmd/csi-plugin/main.go does not exist in the repo today; pkg/csi-driver.Driver is a lapi.Client-backed shim that the real linstor-csi sidecar wraps. harness.CSI follows the existing project shape: Driver + raw lapi.Client against the in-process REST URL. When/if the project grows a gRPC CSI binary, harness.NewCSI can return a buffered-listener gRPC stack with csi.IdentityClient / csi.ControllerClient / csi.NodeClient instead — Phase 1 Group J authors should call this out in their PR if they need it
  2. Satellite.FailNext is a no-op today — the mock never shells out, so there is nothing to fail. The slot exists so Phase 1 Group F (Bug 83 reproduction) can declare the intent without churning the harness API. If a Phase 1 test needs real FakeExec wiring through the simulator, that's a harness extension and should come back here, not be inlined into a group file
  3. AutoDiskfulReconciler is not wired because cmd/controller/main.go itself does not wire it. The wave2/auto-diskful branch landed the reconciler but never plumbed it into the main bootstrap. Mirroring main exactly (per the task brief) means this harness has the same gap; the production binary's behavior is what these tests need to assert against
  4. Resource.Status.DrbdState is the per-resource DRBD state — the test-strategy doc's "DRBDStateOnNode" wording maps onto Status.DrbdState (the existing CRD field). WaitForDRBDState(t, stack, rd, node, want) does the rd+node→Resource-name resolution via the CEL-pinned <rd>.<node> convention

Test plan

  • go build -tags=integration ./tests/integration/... is clean
  • KUBEBUILDER_ASSETS=… go test -tags=integration -count=1 ./tests/integration/... -run '^TestSmokeNodeList' passes locally
  • golangci-lint run --build-tags=integration ./tests/integration/... reports 0 issues
  • go test ./... (no build tag) is unaffected: integration files compile only with -tags=integration
  • CI green on the PR

Build the Tier 2 integration-test scaffold per docs/test-strategy.md so
the 12 Phase 1 group agents have a working harness to extend.

The scaffold boots envtest + the full controller-runtime manager + the
LINSTOR REST server on a free port, drives it with the real upstream
`linstor` CLI, and ships an in-process satellite mock that writes the
healthy steady-state Status fields (Ready, FreeCapacity, UpToDate)
without shelling out.

Files all guarded by `//go:build integration` so `go test ./...`
without the tag is unaffected.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31df697c-275c-4322-a586-165b181b2837

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/integration-harness-phase0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kvaps
Copy link
Copy Markdown
Member Author

kvaps commented May 14, 2026

Cherry-picked harness files into main directly (dev workflow, no PR review). Playbook revert dropped — keeping the no-PR convention.

@kvaps kvaps closed this May 14, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Tier 2 integration test harness for the blockstor project, featuring an envtest setup, a controller-runtime manager with all reconcilers wired, a LINSTOR-compatible REST server, and a satellite mock. The review feedback highlights two necessary improvements for the satellite mock: implementing periodic heartbeat updates to prevent nodes from being marked offline during long tests and refining the storage pool capacity defaulting logic to allow for the simulation of completely full pools.

Comment on lines +192 to +194
if node.Status.ConnectionStatus == desiredStatus && hasReadyTrue(node.Status.Conditions) {
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic skips updating the node status if it is already Online and Ready. This means LastHeartbeatTime is only set once and never updated again. If the NodeHeartbeatReconciler (which is wired in the manager) enforces a heartbeat timeout, nodes might be marked as offline during long-running tests. Consider updating the heartbeat timestamp periodically (e.g., every few seconds) even if the status hasn't changed.

Comment on lines +281 to +284
free := pool.Status.FreeCapacity
if free == 0 {
free = total
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This defaulting logic makes it impossible to simulate a completely full storage pool (where FreeCapacity is 0). If a test intentionally sets FreeCapacity to 0, the mock will overwrite it with the total capacity. Consider applying these defaults only when the status is uninitialized (e.g., when TotalCapacity is 0).

Suggested change
free := pool.Status.FreeCapacity
if free == 0 {
free = total
}
free := pool.Status.FreeCapacity
if free == 0 && pool.Status.TotalCapacity == 0 {
free = total
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant