Skip to content

fix: honest precision tiers + chat-online boundary + README perf reconciliation#197

Open
rylinjames wants to merge 1 commit into
mainfrom
fix/honesty-3-findings
Open

fix: honest precision tiers + chat-online boundary + README perf reconciliation#197
rylinjames wants to merge 1 commit into
mainfrom
fix/honesty-3-findings

Conversation

@rylinjames
Copy link
Copy Markdown
Collaborator

Three honesty fixes — the kind that matter once VERIFICATION.md and the README are sales artifacts a QA/compliance team reads.

1. Verification ledger over-labeled precision

The ledger called everything from 6e-7 to 4.25e-4 "machine precision" — a ~700x spread under one green check. The Eagle VLM stack at 4.25e-4 is fp16 tolerance, not fp32 machine precision (the first row a QA reviewer pulls).

  • Added _precision_tier() to verification_report.py so the generated VERIFICATION.md self-classifies each fixture: <=2e-6 machine precision (fp32) / <=1e-4 tight tolerance / <=5e-3 fp16 tolerance / else DIVERGENT - review.
  • Retiered the README ledger rows (Eagle 4.25e-4 -> fp16 tolerance; DiT 1.78e-5 -> tight tolerance) + added a legend. Fixed the dead measured_numbers.md link.

2. reflex chat silently required network (contradicts offline/private pitch)

chat routes to chat.fastcrest.com (GPT-5 Mini). For the defense/legal/offline positioning that is the inverse of the promise.

  • chat now raises OfflineError on network failure, stating plainly that the offline/air-gap guarantee is the serving path (reflex serve / /act, fully on-device), not the chat helper; self-host via FASTCREST_PROXY_URL.
  • Corrected the README Blackwell "chat-only mode (no GPU needed)" workaround to note it requires network.
  • (Follow-up option: a local chat fallback if we want chat itself air-gappable - flagged, not built.)

3. README perf self-contradiction

The Performance section publishes a 5.55x TRT table, but a later line said "latency numbers intentionally not in the README yet," and the only batching figure (2.3-2.9x) was on the abandoned decomposed-ONNX path.

  • The "no latency yet" line now points at the real published Performance number; the 2.3-2.9x batching figure is marked deprecated (abandoned path, not re-measured).

Test plan

  • 155 passed across chat/verification/backend/report tests
  • Python compiles; _precision_tier maps 4.25e-4 -> "fp16 tolerance"
  • No behavior change to other verbs

Co-Authored-By: Claude Opus 4.7 (1M context)

…rf reconciliation

Three honesty fixes for when VERIFICATION.md + README become sales artifacts: (1) the verification ledger labeled everything 6e-7..4.25e-4 as 'machine precision' — added a _precision_tier() classifier to the generated receipt (<=2e-6 machine / <=1e-4 tight / <=5e-3 fp16 / else DIVERGENT) and retiered the Eagle VLM (4.25e-4 -> fp16 tolerance) + DiT (1.78e-5 -> tight) rows; (2) reflex chat silently required network (routes to chat.fastcrest.com) which contradicts the offline/private pitch — it now raises OfflineError stating the offline guarantee is the serving path not chat, and the README Blackwell 'chat-only' workaround is corrected; (3) README perf was self-contradictory — the '\''latency not in README'\'' line now points at the real published Performance table and the 2.3-2.9x batching figure (abandoned decomposed-ONNX path) is marked deprecated.

155 chat/verification tests pass. No scope/behavior change to other verbs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant