fix: isolate mirror tests from host env to prevent API key leak (#522)#573
Conversation
bingran-you
left a comment
There was a problem hiding this comment.
Thermo-nuclear review verdict: request changes.
The branch fixes one mirror-test class, but the same host-env leak still exists in sibling tests in tests/test_resolve_env_helpers.py.
I reproduced both leaks locally:
GOOGLE_API_KEY=host-dummy-secret BENCHFLOW_DOTENV_PATH=/tmp/no-such-benchflow-env uv run python -m pytest tests/test_resolve_env_helpers.py::TestAutoInheritEnv::test_gemini_mirrored_to_google -q --tb=short
AZURE_RESOURCE=host-dummy-resource BENCHFLOW_DOTENV_PATH=/tmp/no-such-benchflow-env uv run python -m pytest tests/test_resolve_env_helpers.py::TestAutoInheritEnv::test_azure_endpoint_derives_resource -q --tb=shortBoth failures print the host value in the assertion diff. The narrow class-level cleanup means future mirror/derive tests can still accidentally inherit real host credentials.
Recommended fix: pure mirror/derive tests should call auto_inherit_env(env, source_env={}); inheritance-specific tests should pass an explicit source_env={...} instead of depending on or partially clearing the real process env.
Verification notes:
- Branch-head checks passed:
uv sync, fullpytest,ty,ruff. - Current-merge
pytestandtypassed. - Current-merge
rufffailed only on existing main baselineUP041insrc/benchflow/sandbox/docker.py:408, not this PR. - I also attempted real ACP E2E; provider/auth setup prevented a healthy full run for this PR, so I am not treating E2E as passed here.
Rebased onto main (picks up #587's CLAUDE_OAUTH/KIMI additions). Pure mirror/derive tests call auto_inherit_env(env, source_env={}); inheritance tests pass an explicit source_env={...} instead of monkeypatch-ing the real process env. #587's new test_claude_oauth_alias test is made hermetic too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
775347b to
e9f7082
Compare
Thermo-nuclear code quality reviewWhat this PR is trying to solveThis PR closes #522 by making The implementation makes the unit tests explicit about their input source: inheritance tests now pass FindingsNo blocking findings. The previous host-env leakage pattern appears fixed in the touched test surfaces:
I do not see a maintainability regression. The PR does not add production branching, new helper layers, type looseness, abstraction leakage, or file-size risk. Residual risk / follow-upThe PR body says it added an autouse fixture using This is test-only and does not touch SkillsBench, trajectories, verifiers, skill loading, Daytona, or Docker behavior. I do not think a Daytona/Docker canary is required for this merge; the relevant risk is unit-test hermeticity. |
Summary
TestAutoInheritEnvBidirectionalMirrorcalledauto_inherit_env()without clearingos.environ, so real API keys (e.g.GEMINI_API_KEY) on the host leaked into the test dictautousefixture using the existing_clear_keys()helper, matching the isolation pattern already used byTestResolveAgentEnvGeminiin the same fileTest plan
test_agent_env_resolution.pypass (including the previously failingtest_gemini_only_mirrors_to_google)GEMINI_API_KEY/GOOGLE_API_KEYexported in the environmentCloses #522
🤖 Generated with Claude Code