Skip to content

draft: add steering eval experiment on provisional MC#6

Draft
williamli-15 wants to merge 30 commits into
mainfrom
feat/steering-eval
Draft

draft: add steering eval experiment on provisional MC#6
williamli-15 wants to merge 30 commits into
mainfrom
feat/steering-eval

Conversation

@williamli-15
Copy link
Copy Markdown

Summary

This PR adds a formal steering evaluation entrypoint on the current SynthPersona provisional multiple-choice benchmark.

Changes

  • added experiments/01_steering_eval.py
  • added src/persona_vectors/eval.py for MC answer-space scoring
  • added qid metadata to activation artifacts and used it for safer steering-vector alignment when available
  • updated steering docs and README to point to the formal experiment entrypoint

Result

The repo now supports a benchmark-style steering loop instead of only saving vectors:

  • extract templated / biography activations
  • compute biography - templated steering vectors
  • evaluate bare, templated, biography, and steered on the same MC items
  • report target-answer probability, accuracy, and flip statistics

Validation

  • uv run python -m py_compile main.py src/persona_vectors/artifacts.py src/persona_vectors/extraction.py src/persona_vectors/steering.py src/persona_vectors/eval.py experiments/01_steering_eval.py tests/smoke_test.py
  • uv run python tests/smoke_test.py
  • uv run python experiments/01_steering_eval.py --help

Note

A remote dry-run was not completed from this environment because NDIF_API_KEY is not configured here.

@Jac-Zac Jac-Zac force-pushed the main branch 4 times, most recently from 17ecba0 to 9aafab8 Compare May 10, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant