priority queue metrics cause unbounded memory growth and scrape timeouts

### Summary

When using controller-runtime's priority queue (`UsePriorityQueue: true`), the `workqueue_depth` metric can cause unbounded memory growth and extremely slow metrics serialization times, leading to Prometheus scrape timeouts.

### Problem

The `workqueue_depth` metric includes a `priority` label that creates a new metric time series for each unique priority value:

```
workqueue_depth{name="my-controller", controller="my-controller", priority="0"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="1"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="12345"} 0
```

The Prometheus client library never automatically cleans up these metric entries, even after items are dequeued. If an application uses incrementing priority values (e.g., for LIFO ordering), each enqueue creates a persistent metric entry.

### Impact

In a real-world scenario with ~70 controllers using incrementing priorities:
- **810K+ unique metric entries** accumulated over time
- **15+ second metrics serialization time** (exceeds typical 10s scrape timeout)
- Prometheus scrape failures with "broken pipe" errors

### Root Cause

The metric is defined in `pkg/internal/metrics/workqueue.go`:

```go
var (
    depth = prometheus.NewGaugeVec(prometheus.GaugeOpts{
        Subsystem: WorkQueueSubsystem,
        Name:      DepthKey,
        Help:      "Current depth of workqueue by workqueue and priority",
    }, []string{"name", "controller", "priority"})  // <-- priority label
```

And used in `depthWithPriorityMetric`:

```go
func (g *depthWithPriorityMetric) Inc(priority int) {
    depth.WithLabelValues(append(g.lvs, strconv.Itoa(priority))...).Inc()
}
```

Each unique priority integer becomes a distinct label value, creating a new persistent metric entry.

### Workaround

Applications using custom priorities should bound their priority values to a small range (e.g., 0-100) to limit metric cardinality:

```go
const maxPriority = 100
var priorityCounter atomic.Int32

func enqueue(item T) {
    priority := priorityCounter.Add(1)
    if priority >= maxPriority {
        priorityCounter.Store(0)
    }
    queue.AddWithOpts(priorityqueue.AddOpts{Priority: ptr.To(int(priority))}, item)
}
```

### Potential Solutions

1. **Document the cardinality risk** - Add warnings to priority queue documentation about metric cardinality when using custom priorities

2. **Make priority label optional** - Add configuration to disable the priority label for users who don't need per-priority observability

### Environment

- controller-runtime version: v0.22.2
- Go version: 1.24.x
- Kubernetes version: 1.31.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

priority queue metrics cause unbounded memory growth and scrape timeouts #3396

Summary

Problem

Impact

Root Cause

Workaround

Potential Solutions

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

priority queue metrics cause unbounded memory growth and scrape timeouts #3396

Description

Summary

Problem

Impact

Root Cause

Workaround

Potential Solutions

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions