Skip to content

[quantization] [draft] Fix truthfulqa#620

Draft
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:truthfulqa
Draft

[quantization] [draft] Fix truthfulqa#620
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:truthfulqa

Conversation

@stamalakhov
Copy link
Copy Markdown
Contributor

This PR fixes truthfulqa evaluation.

16-bit run of HuggingFaceTB/SmolLM2-135M-Instruct
Original RESULTS ARE:
|    Tasks     |Version|Filter|n-shot|  Metric   |   | Value |   |Stderr|
|--------------|------:|------|-----:|-----------|---|------:|---|-----:|
|truthfulqa_gen|      3|none  |     0|bleu_acc   |↑  | 0.2913|±  |0.0159|
|              |       |none  |     0|bleu_diff  |↑  |-5.2094|±  |0.6062|
|              |       |none  |     0|bleu_max   |↑  |21.3007|±  |0.7145|
|              |       |none  |     0|rouge1_acc |↑  | 0.2938|±  |0.0159|
|              |       |none  |     0|rouge1_diff|↑  |-7.5999|±  |0.6972|
|              |       |none  |     0|rouge1_max |↑  |46.2006|±  |0.8337|
|              |       |none  |     0|rouge2_acc |↑  | 0.2362|±  |0.0149|
|              |       |none  |     0|rouge2_diff|↑  |-8.3506|±  |0.7790|
|              |       |none  |     0|rouge2_max |↑  |29.6729|±  |0.9197|
|              |       |none  |     0|rougeL_acc |↑  | 0.2827|±  |0.0158|
|              |       |none  |     0|rougeL_diff|↑  |-7.4784|±  |0.6796|
|              |       |none  |     0|rougeL_max |↑  |43.0500|±  |0.8333|
|truthfulqa_mc1|      2|none  |     0|acc        |↑  | 0.2399|±  |0.0149|
|truthfulqa_mc2|      3|none  |     0|acc        |↑  | 0.4178|±  |0.0148|
Quantized RESULTS ARE:
|    Tasks     |Version|Filter|n-shot|  Metric   |   | Value |   |Stderr|
|--------------|------:|------|-----:|-----------|---|------:|---|-----:|
|truthfulqa_gen|      3|none  |     0|bleu_acc   |↑  | 0.3121|±  |0.0162|
|              |       |none  |     0|bleu_diff  |↑  |-3.3805|±  |0.5129|
|              |       |none  |     0|bleu_max   |↑  |16.8512|±  |0.6326|
|              |       |none  |     0|rouge1_acc |↑  | 0.3305|±  |0.0165|
|              |       |none  |     0|rouge1_diff|↑  |-5.3833|±  |0.6805|
|              |       |none  |     0|rouge1_max |↑  |40.3370|±  |0.8086|
|              |       |none  |     0|rouge2_acc |↑  | 0.2277|±  |0.0147|
|              |       |none  |     0|rouge2_diff|↑  |-5.7747|±  |0.7057|
|              |       |none  |     0|rouge2_max |↑  |23.0924|±  |0.8486|
|              |       |none  |     0|rougeL_acc |↑  | 0.3121|±  |0.0162|
|              |       |none  |     0|rougeL_diff|↑  |-5.3809|±  |0.6535|
|              |       |none  |     0|rougeL_max |↑  |37.0933|±  |0.7974|
|truthfulqa_mc1|      2|none  |     0|acc        |↑  | 0.2485|±  |0.0151|
|truthfulqa_mc2|      3|none  |     0|acc        |↑  | 0.4126|±  |0.0148|

which are pretty similar

TODO: calibrate kv-cache

Related: #617
TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

@stamalakhov stamalakhov self-assigned this Apr 12, 2026
This PR fixes `truthfulqa` evaluation.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant