Skip to content

Extend HttpClient retry logic to retry on 502/503/504#1446

Open
nevinera wants to merge 1 commit intoShopify:mainfrom
nevinera:retry-on-more-5xx-errors
Open

Extend HttpClient retry logic to retry on 502/503/504#1446
nevinera wants to merge 1 commit intoShopify:mainfrom
nevinera:retry-on-more-5xx-errors

Conversation

@nevinera
Copy link
Copy Markdown

@nevinera nevinera commented May 4, 2026

Description

We've been encountering sporadic 502s from the storefront API fairly constantly (for months at minimum) - I've wrapped a retry layer around it on our end, but it seemed like it would fit better inside the gem itself. I haven't hit 503s/504s from this service so far, but those are also typical indications of an overloaded node or transient infrastructure issue, so I included them as well.

The impact of the change should be that, for requests with retries specified, 502/503/504 response codes are treated just like 500 errors - they wait one second and then retry, up to a total tries requests.

There was no documentation of the behavior of the tries parameter; I added a section to docs/usage/rest.md explaining it, but I'm happy to remove that change if it's too much detail for the doc.

How has this been tested?

I've added HttpClient tests ensuring the new codes get retried like 500s do. I can't produce the 502s on the real gem without significant api request-load (it's occurring for less than 0.05% of requests), but if that's expected I can install the forked gem in production for a few days as well?

Checklist:

  • My commit message follow the pattern described in here
  • I have performed a self-review of my own code.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have updated the project documentation.
  • I have added a changelog line.

@github-actions github-actions Bot added cla-needed devtools-gardener Post the issue or PR to Slack for the gardener labels May 4, 2026
Previously, HttpClient retries requests (when 'tries' was large enough)
in two conditions: on 429 (obeying the retry-after header) and on 500
(retrying after 1 second).

The 502 (Bad Gateway), 503 (Service Unavailable), and 504 (Gateway
Timeout) statuses are all _infrastructure_ prompted conditions, but from
the caller's perspective they are all equivalent to a 500: "the service
failed to handle my request". Allowing retries to apply to these
statuses is natural and appropriate.

Add a small section to the docs describing the behavior of the 'tries'
parameter in more detail, including the updated behavior.
@nevinera nevinera force-pushed the retry-on-more-5xx-errors branch from e42110a to cbdc083 Compare May 4, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devtools-gardener Post the issue or PR to Slack for the gardener

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant