Add event recording and status conditions for worker deployments by thearcticwatch · Pull Request #203 · temporalio/temporal-worker-controller

thearcticwatch · 2026-02-21T00:39:46Z

What changed: Added Kubernetes events and status conditions

(TemporalConnectionHealthy, RolloutReady) to the worker controller
reconciliation loop.

##Why: Reconciliation failures were only visible in controller logs —
events and conditions let users diagnose issues directly via kubectl.

Closes Add events to the TemporalWorkerDeployment CRD when there is a problem #28
How was this tested:
added unit tests
Any docs updates needed?
N/A

CLAassistant · 2026-02-21T00:39:54Z

All committers have signed the CLA.

internal/controller/worker_controller.go

carlydf

also make fmt-imports will solve some of your lint errors

… usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

carlydf

looking good! just did initial review, we should still add a functional test once these comments are addressed.

I found https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#events and https://book.kubebuilder.io/reference/raising-events#creating-events helpful while reviewing.

carlydf · 2026-02-24T03:42:19Z

internal/controller/execplan.go

 		}
 		if err != nil {
+			r.Recorder.Eventf(workerDeploy, corev1.EventTypeWarning, "TestWorkflowStartFailed",
+				"Failed to start gate workflow %q (buildID %s): %v", wf.workflowType, wf.buildID, err)


can you put the task queue name in this event?

carlydf · 2026-02-24T03:45:07Z

internal/controller/execplan.go

+	if err := r.executeK8sOperations(ctx, l, workerDeploy, p); err != nil {
+		return err
+	}
+
+	deploymentHandler := temporalClient.WorkerDeploymentClient().GetHandle(p.WorkerDeploymentName)
+
+	if err := r.startTestWorkflows(ctx, l, workerDeploy, temporalClient, p); err != nil {
+		return err
+	}
+
+	if err := r.updateVersionConfig(ctx, l, workerDeploy, deploymentHandler, p); err != nil {


was this refactor just to make the code easier to understand, or was it necessary for the events change? (I'm all for it, just curious)

Yes, the tets were failing for Cognitive Complexity and the refactor fixed them. Also just a bonus refactor.

carlydf · 2026-02-24T04:50:07Z

internal/controller/worker_controller.go

+		r.setCondition(&workerDeploy, temporaliov1alpha1.ConditionRolloutReady, metav1.ConditionTrue,
+			"RolloutSucceeded", "Target version rollout complete "+workerDeploy.Status.TargetVersion.BuildID)


It feels inconsistent to me to be saying "rollout ready", "rollout succeeded", and "rollout complete" all together. Maybe just "rollout complete" for all of them?

Also, I just realized, the condition as written will cause us to emit a ConditionRolloutReady event at the end of the first reconcile loop that meets this condition (which is correct), but also at the end of every subsequent reconcile loop, which mean we will emit the same event every 30s.

I think if we instead emit this event in the actual updateVersionConfig function where we successfully set the Current Version, that would ensure that we only emit it once per completed rollout

carlydf · 2026-02-24T04:51:37Z

internal/controller/worker_controller.go

 //+kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch
 //+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
 //+kubebuilder:rbac:groups=apps,resources=deployments/scale,verbs=update
+// +kubebuilder:rbac:groups="",resources=events,verbs=create;patch


https://book.kubebuilder.io/reference/raising-events#granting-the-required-permissions says that this would be enough:

// +kubebuilder:rbac:groups=events.k8s.io,resources=events,verbs=create;patch

so I think kubebuilder:rbac:groups="" may be overly permissive?

Shivs11 · 2026-02-27T03:17:15Z

internal/controller/worker_controller.go

+		r.Recorder.Eventf(&workerDeploy, corev1.EventTypeWarning, "TemporalConnectionNotFound",
+			"Unable to fetch TemporalConnection %q: %v", workerDeploy.Spec.WorkerOptions.TemporalConnectionRef.Name, err)
+		r.setCondition(&workerDeploy, temporaliov1alpha1.ConditionTemporalConnectionHealthy, metav1.ConditionFalse,
+			"TemporalConnectionNotFound", fmt.Sprintf("TemporalConnection %q not found: %v", workerDeploy.Spec.WorkerOptions.TemporalConnectionRef.Name, err))
+		_ = r.Status().Update(ctx, &workerDeploy)


seems like this pattern of setting and recording events is repeated throughout the codebase; think we should clean this up by having a nice helper which could also make it look neater

wdyt

Shivs11 · 2026-02-27T03:28:46Z

internal/controller/execplan.go

 )

-func (r *TemporalWorkerDeploymentReconciler) executePlan(ctx context.Context, l logr.Logger, temporalClient sdkclient.Client, p *plan) error {
+func (r *TemporalWorkerDeploymentReconciler) executeK8sOperations(ctx context.Context, l logr.Logger, workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment, p *plan) error {


any specific reason to rename this? i quite liked the approach of having these three files/steps where we first generate a status, make a plan and then act on it.

Shivs11 · 2026-02-27T03:31:16Z

internal/controller/execplan.go

+			ConflictToken: vcfg.ConflictToken,
+			Identity:      getControllerIdentity(),
+		}); err != nil {
+			r.Recorder.Eventf(workerDeploy, corev1.EventTypeWarning, "VersionRegistrationFailed",


we should also log the error here, in addition to recording the event here imo. We have followed this elsewhere in the same file and in the code, and I think it could be good practice given that if we do have a persistence failure, the logs would still be an option for an operator.

Shivs11 · 2026-02-27T03:32:53Z

internal/controller/worker_controller.go

+		r.Recorder.Eventf(&workerDeploy, corev1.EventTypeWarning, "PlanGenerationFailed",
+			"Unable to generate reconciliation plan: %v", err)
 		return ctrl.Result{}, err
 	}

 	// Execute the plan, handling any errors
-	if err := r.executePlan(ctx, l, temporalClient, plan); err != nil {
+	if err := r.executePlan(ctx, l, &workerDeploy, temporalClient, plan); err != nil {
+		r.Recorder.Eventf(&workerDeploy, corev1.EventTypeWarning, "PlanExecutionFailed",
+			"Unable to execute reconciliation plan: %v", err)


any specific reason you chose to not persist failures for plan generation and execution phases but persisted when we do generate the status from the server?

thearcticwatch requested review from a team and jlegrone as code owners February 21, 2026 00:39

carlydf reviewed Feb 21, 2026

View reviewed changes

internal/controller/worker_controller.go Outdated Show resolved Hide resolved

carlydf reviewed Feb 21, 2026

View reviewed changes

thearcticwatch enabled auto-merge (squash) February 21, 2026 14:14

thearcticwatch and others added 4 commits February 21, 2026 12:26

First attempt at adding events to worker controller

15d7b9a

Adding events to worker controller

4495d80

fix the breaking tests

6c8e4f8

Fix golangci-lint errors: RBAC comment spacing and deprecated Requeue…

5900793

… usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thearcticwatch force-pushed the add_events branch from 5632069 to 5900793 Compare February 21, 2026 20:27

Refactor executePlan to reduce cognitive complexity

e65c9d1

carlydf reviewed Feb 24, 2026

View reviewed changes

addressing the comments

df3448a

Shivs11 reviewed Feb 27, 2026

View reviewed changes

thearcticwatch added 2 commits February 27, 2026 14:52

additional refactoring and addressing comments

0b76e06

Merge branch 'main' into add_events

06882a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add event recording and status conditions for worker deployments#203

Add event recording and status conditions for worker deployments#203
thearcticwatch wants to merge 8 commits intomainfrom
add_events

thearcticwatch commented Feb 21, 2026

Uh oh!

CLAassistant commented Feb 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

carlydf left a comment

Uh oh!

carlydf left a comment

Uh oh!

carlydf Feb 24, 2026

Uh oh!

carlydf Feb 24, 2026

Uh oh!

thearcticwatch Feb 27, 2026

Uh oh!

carlydf Feb 24, 2026

Uh oh!

carlydf Feb 24, 2026

Uh oh!

Shivs11 Feb 27, 2026

Uh oh!

Shivs11 Feb 27, 2026

Uh oh!

Shivs11 Feb 27, 2026

Uh oh!

Shivs11 Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		r.setCondition(&workerDeploy, temporaliov1alpha1.ConditionRolloutReady, metav1.ConditionTrue,
		"RolloutSucceeded", "Target version rollout complete "+workerDeploy.Status.TargetVersion.BuildID)

Conversation

thearcticwatch commented Feb 21, 2026

What changed: Added Kubernetes events and status conditions

Uh oh!

CLAassistant commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

carlydf left a comment

Choose a reason for hiding this comment

Uh oh!

carlydf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Feb 21, 2026 •

edited

Loading