-
Notifications
You must be signed in to change notification settings - Fork 0
Description
CI run: https://github.com/coder/coder/actions/runs/21086635027
Commit: 4d414a0df79ed37dafff5c9d5951d5799a63d672 ("feat: add --use-parameter-defaults flag") by Asher [email protected]
What failed
Two separate jobs failed because sum.golang.org returned 500 while Go was verifying modules:
lint job
... verifying module: github.com/prometheus/[email protected]: reading https://sum.golang.org/tile/8/0/x025/567: 500 Internal Server Error
offlinedocs job (during setup-sqlc)
... github.com/pganalyze/pg_query_go/[email protected]: verifying module: ... reading https://sum.golang.org/tile/8/0/x141/114: 500 Internal Server Error
The required job then failed because these required checks were red.
Root cause classification
Infrastructure / external dependency outage (Go checksum database).
Why this is worth tracking
Even if upstream is intermittently unavailable, it causes CI to hard-fail (no built-in retries in go install / go mod download). We may want CI-level mitigation.
Suggested mitigations
- Wrap Go module download/install steps in retry/backoff (especially tool installs in CI actions).
- Consider a checksum DB fallback/mirror for CI (e.g. alternate
GOSUMDB), if acceptable.
Assignment rationale
This is CI resiliency work (not tied to a particular product component). Assigning to kacpersaw as a recent maintainer of CI resiliency changes (e.g. get.helm.sh outage fallback in PR coder/coder#21268).