-
Notifications
You must be signed in to change notification settings - Fork 862
fix(sandbox): eliminate startup stale-policy forward proxy race #1942
Copy link
Copy link
Open
Labels
area:policyPolicy engine and policy lifecycle workPolicy engine and policy lifecycle workarea:sandboxSandbox runtime and isolation workSandbox runtime and isolation workgator:follow-up-neededGator needs submitter or maintainer follow-upGator needs submitter or maintainer follow-uptech-debttest:e2eRequires end-to-end coverageRequires end-to-end coveragetopic:l7Application-layer policy and inspection workApplication-layer policy and inspection worktopic:testing
Metadata
Metadata
Assignees
Labels
area:policyPolicy engine and policy lifecycle workPolicy engine and policy lifecycle workarea:sandboxSandbox runtime and isolation workSandbox runtime and isolation workgator:follow-up-neededGator needs submitter or maintainer follow-upGator needs submitter or maintainer follow-uptech-debttest:e2eRequires end-to-end coverageRequires end-to-end coveragetopic:l7Application-layer policy and inspection workApplication-layer policy and inspection worktopic:testing
Type
Fields
Give feedbackNo fields configured for issues without a type.
Description
PR #1929 deflakes the forward proxy GraphQL L7 e2e by retrying expected-allowed requests that temporarily return
403. The linked review explains the root cause: startup symlink-resolution policy reload can run after the entrypoint PID and/proc/<pid>/rootbecome available, advancing the policy generation while the first allowed HTTP forward request is being evaluated. That lets an allowed request observe a stale generation and receive a transient403even though the L7 policy should allow it.Review discussion: #1929 (comment)
Context
The PR workaround is intentionally test-scoped: it retries only requests that the test expects to allow, and expected-denied requests stay single-shot. That avoids masking too-permissive-policy regressions, but it does not address the product race.
Root-cause directions captured in review:
policy.localendpoint such asGET http://policy.local/v1/policy/startup-ready?timeout=10, which resolves once the initial symlink-resolution reload has succeeded, failed non-fatally, or been skipped.403.Definition of Done