feat: eval gate, pytest-timeout fix, README consistency (cycle 1 audit P0s) by ChunkyTortoise · Pull Request #48 · ChunkyTortoise/EnterpriseHub

ChunkyTortoise · 2026-04-26T09:56:22Z

Summary

Hero repo audit cycle 1 P0 fixes. Score: 36/50 -> 39/50.

Fix test count inconsistency across all README references (6,700 -> 7,678, consistent with badge and pytest output)
Add pytest-timeout>=0.5 to requirements-dev.txt -- resolves test_strategic_claude_consultant.py collection error
Add evals/run_evals_deterministic.py -- validates golden_dataset.json structure (50 cases, no duplicate IDs, required fields, category distribution). No API key needed.
Add evals/RESULTS.md -- 50/50 PASS, 2026-04-26
Gate deterministic evals in CI as blocking step; keep LLM-as-judge as advisory (continue-on-error)
Add make evals target
Fix Contributing section test command to use full testpaths from pytest.ini
Add MCP server row to For Hiring Managers table

Test plan

python evals/run_evals_deterministic.py exits 0 (50/50 PASS)
pytest tests/services/test_strategic_claude_consultant.py --collect-only collects 27 tests (no timeout marker error)
grep "7,678" README.md | wc -l returns 4 (all references consistent)
CI eval job now runs deterministic checks without secrets

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…onsistency Cycle 1 P0 fixes (score: 36/50 -> 39/50): - Fix test count inconsistency: 6,700 -> 7,678 across all README references - Add pytest-timeout>=0.5 to requirements-dev.txt (fixes collection error) - Add evals/run_evals_deterministic.py: 50/50 dataset structure checks, exits 0 - Add evals/RESULTS.md documenting last successful run - Gate deterministic evals in CI (blocking, no API key); LLM-as-judge advisory - Add make evals target to Makefile - Fix Contributing test command to use full pytest.ini testpaths - Add MCP server row to For Hiring Managers table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bed07190d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T10:01:21Z

+    if len(dataset) != EXPECTED_TOTAL:
+        failures.append(f"Expected {EXPECTED_TOTAL} cases, got {len(dataset)}")
+
+    ids = [tc["id"] for tc in dataset]


Guard missing IDs before duplicate scan

The deterministic validator reads every case ID via tc["id"] before it checks REQUIRED_FIELDS, so a single malformed record without id raises KeyError and aborts the run with a traceback. In that scenario CI still fails, but you lose the structured failure report this script is intended to produce, making dataset regressions harder to diagnose quickly.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T10:01:21Z

+    # mentions a disclosure trigger scenario.
+    for tc in dataset:
+        if tc.get("category") == "compliance":
+            props = tc["expected_output_properties"]


Skip malformed compliance cases in second pass

After the main loop already records missing top-level fields, the compliance-only pass dereferences tc["expected_output_properties"] unconditionally. A compliance case missing that field will crash with KeyError instead of being reported as a normal validation failure, which undermines the deterministic gate's usefulness when schema errors are introduced.

Useful? React with 👍 / 👎.

chunktort and others added 2 commits April 16, 2026 20:35

docs: update test count badge to 7,678 and consolidate badges

69767c0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed Apr 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: eval gate, pytest-timeout fix, README consistency (cycle 1 audit P0s)#48

feat: eval gate, pytest-timeout fix, README consistency (cycle 1 audit P0s)#48
ChunkyTortoise wants to merge 2 commits into
mainfrom
feat/hiring-signal-enhancement

ChunkyTortoise commented Apr 26, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChunkyTortoise commented Apr 26, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants