-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Summary
When using controller-runtime's priority queue (UsePriorityQueue: true), the workqueue_depth metric can cause unbounded memory growth and extremely slow metrics serialization times, leading to Prometheus scrape timeouts.
Problem
The workqueue_depth metric includes a priority label that creates a new metric time series for each unique priority value:
workqueue_depth{name="my-controller", controller="my-controller", priority="0"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="1"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="12345"} 0
The Prometheus client library never automatically cleans up these metric entries, even after items are dequeued. If an application uses incrementing priority values (e.g., for LIFO ordering), each enqueue creates a persistent metric entry.
Impact
In a real-world scenario with ~70 controllers using incrementing priorities:
- 810K+ unique metric entries accumulated over time
- 15+ second metrics serialization time (exceeds typical 10s scrape timeout)
- Prometheus scrape failures with "broken pipe" errors
Root Cause
The metric is defined in pkg/internal/metrics/workqueue.go:
var (
depth = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: WorkQueueSubsystem,
Name: DepthKey,
Help: "Current depth of workqueue by workqueue and priority",
}, []string{"name", "controller", "priority"}) // <-- priority labelAnd used in depthWithPriorityMetric:
func (g *depthWithPriorityMetric) Inc(priority int) {
depth.WithLabelValues(append(g.lvs, strconv.Itoa(priority))...).Inc()
}Each unique priority integer becomes a distinct label value, creating a new persistent metric entry.
Workaround
Applications using custom priorities should bound their priority values to a small range (e.g., 0-100) to limit metric cardinality:
const maxPriority = 100
var priorityCounter atomic.Int32
func enqueue(item T) {
priority := priorityCounter.Add(1)
if priority >= maxPriority {
priorityCounter.Store(0)
}
queue.AddWithOpts(priorityqueue.AddOpts{Priority: ptr.To(int(priority))}, item)
}Potential Solutions
-
Document the cardinality risk - Add warnings to priority queue documentation about metric cardinality when using custom priorities
-
Make priority label optional - Add configuration to disable the priority label for users who don't need per-priority observability
Environment
- controller-runtime version: v0.22.2
- Go version: 1.24.x
- Kubernetes version: 1.31.x