feat(transport): add HTTP retry with exponential backoff by jpnurmi · Pull Request #1520 · getsentry/sentry-native

jpnurmi · 2026-02-13T15:44:54Z

Add HTTP retry with exponential backoff for network failures, modeled after Crashpad's upload retry behavior.

Failed envelopes are stored as <db>/cache/<ts>-<n>-<uuid>.envelope and retried on startup after a 100ms throttle, and then with exponential backoff (15min, 30min, 1h, 2h, 8h). When retries are exhausted, and offline caching is enabled, envelopes are stored as <db>/cache/<uuid>.envelope instead of being discarded.

flowchart TD
    startup --> R{retry?}
    R -->|yes| throttle
    R -->|no| C{cache?}
    throttle -. 100ms .-> resend
    resend -->|success| C
    resend -->|fail| C2[&lt;db&gt;/cache/<br/>&lt;ts&gt;-&lt;n&gt;-&lt;uuid&gt;.envelope]
    C2 --> backoff
    backoff -. 2ⁿ×15min .-> resend
    C -->|yes| CACHE[&lt;db&gt;/cache/<br/>&lt;uuid&gt;.envelope]
    C -->|no| discard

See also: https://develop.sentry.dev/sdk/expected-features/#buffer-to-disk

Depends on:

See also:

Support Offline Caching of Envelopes #1316

github-actions · 2026-02-13T15:45:22Z

	Messages
📖	Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against ffce486

jpnurmi · 2026-02-16T08:59:33Z

@sentry review

jpnurmi · 2026-02-16T08:59:39Z

@cursor review

src/sentry_sync.c

src/transports/sentry_http_transport.c

src/sentry_retry.c

src/transports/sentry_http_transport_winhttp.c

src/sentry_retry.c

jpnurmi · 2026-02-16T15:21:30Z

@sentry review

jpnurmi · 2026-02-16T15:21:38Z

@cursor review

src/transports/sentry_http_transport.c

src/sentry_retry.c

jpnurmi · 2026-02-16T16:09:06Z

@cursor review

src/sentry_sync.c

src/sentry_database.c

jpnurmi · 2026-02-16T16:41:48Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

jpnurmi · 2026-02-16T19:42:40Z

@cursor review

src/sentry_retry.c

src/sentry_transport.c

jpnurmi · 2026-02-17T09:34:34Z

@cursor review

src/sentry_retry.c

src/sentry_sync.c

jpnurmi · 2026-02-17T10:27:24Z

@sentry review

jpnurmi · 2026-02-17T10:27:33Z

@cursor review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Store parsed fields (ts, count, uuid) alongside the path during the filter phase so handle_result and future debug logging can use them without re-parsing. Also improves sort performance by comparing numeric fields before falling back to string comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Log retry attempts at DEBUG level and max-retries-reached at WARN level to make retry behavior observable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…writes Three places independently constructed <database>/cache and wrote envelopes there. Add cache_path to sentry_run_t and introduce sentry__run_write_cache() and sentry__run_move_cache() to centralize the cache directory creation and file operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CURLOPT_TIMEOUT_MS is a total transfer timeout that could cut off large envelopes. Use CURLOPT_CONNECTTIMEOUT_MS instead so only connection establishment is bounded. For winhttp, limit resolve and connect to 15s but leave send/receive at their defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Without this, sentry__retry_send overcounts remaining files, causing an unnecessary extra poll cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure handle_result so "max retries reached" warnings only fire on actual network failures, not on successful delivery at the last attempt. Separate the warning logic from the cache/discard actions and put the re-enqueue branch first for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the `can_retry` bool on the transport with a `retry_func` callback, and expose `sentry_transport_retry()` as an experimental public API for explicitly retrying all pending envelopes, e.g. when coming back online. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move retry envelopes from a separate retry/ directory into cache/ so that sentry__cleanup_cache() enforces disk limits for both file formats out of the box. The two formats are distinguishable by length: retry files use <ts>-<count>-<uuid>.envelope (49+ chars) while cache files use <uuid>.envelope (45 chars). Default http_retries to 0 (opt-in). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When bgworker is detached during shutdown timeout, retry_poll_task can access retry->run->cache_path after sentry_options_free frees the run. Clone the path so it outlives options and is freed with the bgworker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The bgworker_flush in sentry__retry_flush would delay its flush_task by min(delayed_task_time, timeout) when a 15-minute delayed retry_poll_task existed. This consumed the entire shutdown timeout, leaving 0ms for bgworker_shutdown, which then detached the worker thread. On Windows, winhttp_client_shutdown would close handles still in use by the detached thread, causing a crash. The flush is unnecessary because retry_flush_task is an immediate task and bgworker_shutdown already processes all immediate tasks before the shutdown_task runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous commit removed bgworker_flush from retry_flush, which caused a race between WinHTTP connect timeout (~2s) and bgworker shutdown (2s). Restore the flush and pass the full timeout to both flush and shutdown — after flush drains in-flight work, shutdown completes near-instantly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make retry count an internal constant (SENTRY_RETRY_ATTEMPTS = 5) and expose only a boolean toggle. Enabled by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

0 means infinite, not default. Pass 30000ms to match WinHTTP defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use a 'scheduled' flag with atomic compare-and-swap to ensure at most one retry_poll_task is queued at a time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move `sealed = 1` before `foreach_matching` in `retry_dump_queue` to prevent the detached worker from writing duplicate envelopes via `retry_enqueue` while the main thread is dumping the queue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Drop the delayed retry_poll_task before bgworker_flush to prevent it from delaying the flush_task by min(retry_interval, timeout). Subtract elapsed flush time from the shutdown timeout so the total is bounded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When the bgworker is detached after shutdown timeout, retry_dump_queue writes retry files and sets sealed=1. The detached thread could then run retry_flush_task and re-send those files, causing duplicates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The retry system writes cache files directly via its own paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

retry_trigger_task recursively re-triggered itself on network failure, bypassing exponential backoff (UINT64_MAX skips the backoff check) and burning through all 5 retry attempts in milliseconds. Since sentry__retry_send already processes all cached envelopes in a single call, the re-trigger is only ever reached on network failure — exactly the case where it's harmful. Make the trigger one-shot; failed items are left for the regular poll task which respects backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cleanup_cache was gated on sentry__transport_can_retry, which checks for retry_func. Since retry_func is unconditionally set for all HTTP transports, this ran cleanup_cache even with http_retry disabled. Check the option directly instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reject negative counts in parse_filename (a corrupted filename like 123--01-<uuid>.envelope parses count=-1 via strtol). Also clamp the count in sentry__retry_backoff to prevent left-shift by a negative amount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

src/sentry_options.c

…ters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi · 2026-02-17T11:22:11Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from b083a57 to a264f66 Compare February 13, 2026 17:47

sentry bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_sync.c Show resolved Hide resolved

src/transports/sentry_http_transport.c Show resolved Hide resolved

src/sentry_retry.c Outdated Show resolved Hide resolved