Skip to content

fix(sandbox): eliminate startup stale-policy forward proxy race #1942

@drew

Description

@drew

Description

PR #1929 deflakes the forward proxy GraphQL L7 e2e by retrying expected-allowed requests that temporarily return 403. The linked review explains the root cause: startup symlink-resolution policy reload can run after the entrypoint PID and /proc/<pid>/root become available, advancing the policy generation while the first allowed HTTP forward request is being evaluated. That lets an allowed request observe a stale generation and receive a transient 403 even though the L7 policy should allow it.

Review discussion: #1929 (comment)

Context

The PR workaround is intentionally test-scoped: it retries only requests that the test expects to allow, and expected-denied requests stay single-shot. That avoids masking too-permissive-policy regressions, but it does not address the product race.

Root-cause directions captured in review:

  • Add explicit startup readiness, for example a narrow internal policy.local endpoint such as GET http://policy.local/v1/policy/startup-ready?timeout=10, which resolves once the initial symlink-resolution reload has succeeded, failed non-fatally, or been skipped.
  • Or handle the race inside the forward proxy: for one-shot HTTP forward requests, if policy generation changes before any bytes are written upstream, re-evaluate once against the current generation instead of returning a transient 403.
  • Account for GraphQL and chunked request bodies carefully. Buffered inspection must not be reread or replayed after an upstream write.

Definition of Done

  • Choose whether readiness signaling, proxy-side re-evaluation, or another design is the product fix.
  • Implement the fix without hiding expected-denied L7 regressions.
  • Keep or update e2e coverage so expected-allowed startup requests no longer need ad hoc retry logic.
  • Document any new internal diagnostic API or forward proxy generation-retry semantics if introduced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:policyPolicy engine and policy lifecycle workarea:sandboxSandbox runtime and isolation workgator:follow-up-neededGator needs submitter or maintainer follow-uptech-debttest:e2eRequires end-to-end coveragetopic:l7Application-layer policy and inspection worktopic:testing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions