draft: add steering eval experiment on provisional MC by williamli-15 · Pull Request #6 · implicit-personalization/persona-vectors

williamli-15 · 2026-04-17T15:44:15Z

Summary

This PR adds a formal steering evaluation entrypoint on the current SynthPersona provisional multiple-choice benchmark.

Changes

added experiments/01_steering_eval.py
added src/persona_vectors/eval.py for MC answer-space scoring
added qid metadata to activation artifacts and used it for safer steering-vector alignment when available
updated steering docs and README to point to the formal experiment entrypoint

Result

The repo now supports a benchmark-style steering loop instead of only saving vectors:

extract templated / biography activations
compute biography - templated steering vectors
evaluate bare, templated, biography, and steered on the same MC items
report target-answer probability, accuracy, and flip statistics

Validation

uv run python -m py_compile main.py src/persona_vectors/artifacts.py src/persona_vectors/extraction.py src/persona_vectors/steering.py src/persona_vectors/eval.py experiments/01_steering_eval.py tests/smoke_test.py
uv run python tests/smoke_test.py
uv run python experiments/01_steering_eval.py --help

Note

A remote dry-run was not completed from this environment because NDIF_API_KEY is not configured here.

williamli-15 added 2 commits April 17, 2026 08:43

feat: add steering eval experiment on provisional MC

3f698f0

fix: correct steering apply shape in MC eval

eec496f

williamli-15 force-pushed the feat/steering-eval branch from 7369f62 to eec496f Compare April 22, 2026 01:55

williamli-15 added 27 commits April 23, 2026 12:07

feat: add corrected-contract persona steering eval

28e4a61

feat: add projected steering rerun

3209989

feat: add item-conditioned steering oracle

9870d8f

feat: add prompt-side steering diagnostics

cadd85e

feat: parameterize prompt-side steering runner

8563913

feat: parameterize projected steering runner

8c00474

feat: add generation actadd smoke test

de06df1

feat: add causal steering smoke scripts

aae5e4d

feat: allow centering override for projected steering

d15f003

feat: add additive final-token patch mode

ea1302b

feat: add baseline prompt-last q20 runner

711d41a

feat: add leave-one-out steering diagnostic

6f67168

feat: add attribute direction steering smoke

0d018c2

add trait direction suite

e3db08b

add trait option rotation control

81e3f07

batch trait scoring for option controls

c86e987

add letter balanced trait extraction

c9e546f

add response mean direction suite

226bd68

add attribute MC steering probes

6168f0f

norm match attribute steering control

060799f

add logged attribute layer sweep runner

848a150

add response mean attribute result summarizer

1c51544

include vector norms in attribute steering summary

12643f6

persist response mean activation cache

27332c6

support response mean activation cache reuse

daf60f5

ignore archived experiment artifacts

2dfc575

add layer sweep timeout guard

1ab2ef8

add attribute MC item analyzer

7e85fd6

Jac-Zac force-pushed the main branch 4 times, most recently from 17ecba0 to 9aafab8 Compare May 10, 2026 09:46

Jac-Zac force-pushed the main branch from 90d5ffe to 6ca72bb Compare May 16, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: add steering eval experiment on provisional MC#6

draft: add steering eval experiment on provisional MC#6
williamli-15 wants to merge 30 commits into
mainfrom
feat/steering-eval

williamli-15 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

williamli-15 commented Apr 17, 2026

Summary

Changes

Result

Validation

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant