Skip to content

Add Opsgenie alias field for alert deduplication#1292

Open
cairon-ab wants to merge 2 commits into
fluxcd:mainfrom
cairon-ab:opsgenie-alias-dedup
Open

Add Opsgenie alias field for alert deduplication#1292
cairon-ab wants to merge 2 commits into
fluxcd:mainfrom
cairon-ab:opsgenie-alias-dedup

Conversation

@cairon-ab
Copy link
Copy Markdown

Summary

Set the alias field on Opsgenie alerts using a SHA-256 hash of the involved object's Kind, Namespace, Name, and event Reason. This allows Opsgenie to deduplicate repeated alerts for the same source instead of creating new pages for each firing.

Changes

  • Added Alias field to OpsgenieAlert struct with json:"alias,omitempty"
  • Added generateOpsgenieAlias() function that creates a deterministic SHA-256 hash from the event's InvolvedObject.Kind, InvolvedObject.Namespace, InvolvedObject.Name, and Reason
  • Wired the alias into Post() so every alert carries a stable alias
  • Added comprehensive tests: TestOpsgenie_PostAlias (verifies alias in HTTP payload across multiple scenarios) and TestGenerateOpsgenieAlias (determinism, length, uniqueness)

Why

Currently, Opsgenie creates a new alert for every notification event, even when it's for the same object and reason. The Opsgenie Alert API supports an alias field that acts as a deduplication key — alerts with the same alias are grouped as a single incident.

By hashing Kind/Namespace/Name/Reason, we ensure:

  • Determinism: Same event source always produces the same alias
  • Deduplication: Repeated firings update the existing alert instead of paging again
  • Differentiation: Different reasons (e.g. ReconciliationFailed vs HealthCheckFailed) create separate alerts
  • Size safety: The 64-char hex hash stays well within Opsgenie's 512-char alias limit

Fixes #460

Set the alias field on Opsgenie alerts using a SHA-256 hash of the
involved object's Kind, Namespace, Name, and event Reason. This allows
Opsgenie to deduplicate repeated alerts for the same source instead of
creating new pages for each firing.

Fixes fluxcd#460

Signed-off-by: Cairon <cairon-ab@users.noreply.github.com>
Comment on lines +112 to +115
event.InvolvedObject.Kind,
event.InvolvedObject.Namespace,
event.InvolvedObject.Name,
event.Reason,
Copy link
Copy Markdown
Member

@stefanprodan stefanprodan Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause alerts from multiple clusters to aggregate under the same incident since all clusters have Kustomization/flux-system/flux-system, which is a major breaking change. Adding the Alert Provider UID to the checksum would ensure each cluster gets a dedicated incident.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — in a multi-cluster setup Kustomization/flux-system/flux-system would hash identically across clusters, collapsing separate incidents into one.

ProviderUID is already available in notifierOptions but opsgenieNotifierFunc doesn't pass it through to NewOpsgenie. I'll thread it into the Opsgenie struct and prepend it to the alias hash input so each Provider (and therefore each cluster) gets a unique alias. Will push the fix shortly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like AI talking

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — I did lean on AI for drafting that reply. The fix is in f322c46 though: threads ProviderUID into the Opsgenie struct and prepends it to the alias hash, so each cluster's Provider produces distinct aliases. Added tests for the multi-cluster case. Happy to adjust if the approach doesn't look right.

Without the provider UID in the alias hash, alerts from different
clusters sharing the same involved object (e.g.
Kustomization/flux-system/flux-system) would produce identical aliases
and Opsgenie would aggregate them into a single incident.

Thread the ProviderUID from notifierOptions into the Opsgenie struct
and prepend it to the alias hash input so each cluster's Provider
resource produces a unique alias.

Signed-off-by: cairon-ab <cairon-ab@users.noreply.github.com>
Signed-off-by: Cairon <cairon-ab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opsgenie Alias for deduplication of alerts

3 participants