Add Opsgenie alias field for alert deduplication#1292
Conversation
Set the alias field on Opsgenie alerts using a SHA-256 hash of the involved object's Kind, Namespace, Name, and event Reason. This allows Opsgenie to deduplicate repeated alerts for the same source instead of creating new pages for each firing. Fixes fluxcd#460 Signed-off-by: Cairon <cairon-ab@users.noreply.github.com>
| event.InvolvedObject.Kind, | ||
| event.InvolvedObject.Namespace, | ||
| event.InvolvedObject.Name, | ||
| event.Reason, |
There was a problem hiding this comment.
This will cause alerts from multiple clusters to aggregate under the same incident since all clusters have Kustomization/flux-system/flux-system, which is a major breaking change. Adding the Alert Provider UID to the checksum would ensure each cluster gets a dedicated incident.
There was a problem hiding this comment.
Good catch — in a multi-cluster setup Kustomization/flux-system/flux-system would hash identically across clusters, collapsing separate incidents into one.
ProviderUID is already available in notifierOptions but opsgenieNotifierFunc doesn't pass it through to NewOpsgenie. I'll thread it into the Opsgenie struct and prepend it to the alias hash input so each Provider (and therefore each cluster) gets a unique alias. Will push the fix shortly.
There was a problem hiding this comment.
Fair point — I did lean on AI for drafting that reply. The fix is in f322c46 though: threads ProviderUID into the Opsgenie struct and prepends it to the alias hash, so each cluster's Provider produces distinct aliases. Added tests for the multi-cluster case. Happy to adjust if the approach doesn't look right.
Without the provider UID in the alias hash, alerts from different clusters sharing the same involved object (e.g. Kustomization/flux-system/flux-system) would produce identical aliases and Opsgenie would aggregate them into a single incident. Thread the ProviderUID from notifierOptions into the Opsgenie struct and prepend it to the alias hash input so each cluster's Provider resource produces a unique alias. Signed-off-by: cairon-ab <cairon-ab@users.noreply.github.com> Signed-off-by: Cairon <cairon-ab@users.noreply.github.com>
Summary
Set the
aliasfield on Opsgenie alerts using a SHA-256 hash of the involved object's Kind, Namespace, Name, and event Reason. This allows Opsgenie to deduplicate repeated alerts for the same source instead of creating new pages for each firing.Changes
Aliasfield toOpsgenieAlertstruct withjson:"alias,omitempty"generateOpsgenieAlias()function that creates a deterministic SHA-256 hash from the event'sInvolvedObject.Kind,InvolvedObject.Namespace,InvolvedObject.Name, andReasonPost()so every alert carries a stable aliasTestOpsgenie_PostAlias(verifies alias in HTTP payload across multiple scenarios) andTestGenerateOpsgenieAlias(determinism, length, uniqueness)Why
Currently, Opsgenie creates a new alert for every notification event, even when it's for the same object and reason. The Opsgenie Alert API supports an
aliasfield that acts as a deduplication key — alerts with the same alias are grouped as a single incident.By hashing
Kind/Namespace/Name/Reason, we ensure:ReconciliationFailedvsHealthCheckFailed) create separate alertsFixes #460