[quantization] Save torch artifacts by stamalakhov · Pull Request #625 · Samsung/TICO

stamalakhov · 2026-04-13T09:41:19Z

This PR saves torch artifacts to be used later in evaluation or quantization.

sample output for `Maykeye/TinyLLama-v0`

Namespace(model='Maykeye/TinyLLama-v0', device='cuda', dtype='float32', seed=42, trust_remote_code=False, hf_token=None, no_tqdm=False, no_GPTQ=False, no_spinquant=False, no_PTQ=False, save_circle_to_folder=None, save_layers_to_folder=None, save_torch_artifacts_to_folder='.', cache_dir='/mnt/storage/transformers_cache', nsamples_for_qcalibration=32, linear_weight_bits=4, gptq_mse='smse', max_seq_len=2048, calibrate_seq_len=2048, embedding_weight_bits=8, lm_head_weight_bits=4, eval_tasks=None, sensitivity_path=None)
=== Config ===
Model            : Maykeye/TinyLLama-v0
Device           : cuda
DType            : float32

Loading FP model …
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1714.24it/s]
Applying SpinQuant preprocessing …
Applying SpinQuant rotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 274.02it/s]

Calculating original perplexities …
Token indices sequence length is longer than the specified maximum sequence length for this model (324381 > 2048). Running this sequence through the model will result in indexing errors
PPL:   0%|                                                                                                                                                                                                                       | 0/159 [00:00<?, ?it/s]/mnt/storage/slow_repos/TICO_1/TICO/tico/quantization/algorithm/spinquant/spin_llama.py:150: FutureWarning: `input_embeds` is deprecated and will be removed in version 5.6.0 for `create_causal_mask`. Use `inputs_embeds` instead.
  causal_mask = create_causal_mask(
PPL:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 158/159 [00:04<00:00, 33.52it/s]

┌── Wikitext-2 test perplexity ─────────────
│ FP32 :  7584.31
└───────────────────────────────────────────
Applying GPTQ …
Computing calibration set
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 34.30it/s]
Calibrating sensitivity
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:04<00:00,  6.53it/s]
Saving calibrated_sensitivities to sensitivities_for_Maykeye_TinyLLama-v0_wikitext_32_42.pt
Quantizing layers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:12<00:00,  1.62s/layer]
Wrapping layers with PTQWrapper …                                                                                                                                                                                                                        
Calibrating PTQ obeservers…
  0%|                                                                                                                                                                                                                             | 0/32 [00:00<?, ?it/s]`use_return_dict` is deprecated! Use `return_dict` instead!
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:17<00:00,  1.83it/s]
Saving PTQ model to PTQ_Maykeye_TinyLLama-v0_SpinQuant_GPTQ_smse_32_42.pt

Calculating perplexities …
PPL:  99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 158/159 [00:59<00:00,  2.66it/s]

┌── Wikitext-2 test perplexity ─────────────
│ int16 :  7181.82
└───────────────────────────────────────────

TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This PR saves torch artifacts to be used later in evaluation or quantization. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

mhs4670go · 2026-04-13T12:35:11Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+        "--save_torch_artifacts_to_folder",
+        type=str,
+        default=None,
+        help="Save all layers to the folder specified",


Help message should be changed. Current one seems to be copied from other option.

mhs4670go · 2026-04-13T12:37:04Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+            save_name = get_ptq_model_name(model, args)
+            save_path = pathlib.Path(args.save_torch_artifacts_to_folder, save_name)
+            print(f"Saving PTQ model to {save_path}")
+            torch.save(q_m, save_path)


Could you let me know what this saving is for?

Addionally, saving the entire module object with torch.save(q_m, ...) may be fragile across code changes and environments.

@mhs4670go
please see #626, we can use it to evaluate saved model. gptq can be time consuming, as well as evaluation of quantized model. So if we have saved model, we can later reevaluate it using different set of benchmarks. Currently it's supposed to have single environment.

Got it. How about adding a comment like this above the function call?

# Save quantized model for later re-evaluation (used by eval script to skip re-quantization) torch.save(q_m, save_path)

mhs4670go · 2026-04-13T12:45:41Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

        help="Save all layers to the folder specified",
    )
+    parser.add_argument(
+        "--save_torch_artifacts_to_folder",


save_torch_artifacts_to_folder feels too broad and a bit misleading for what this option actually does.

Currently, this single flag is used to save multiple outputs (e.g., GPTQ sensitivities and a PTQ model checkpoint), which have different purposes and lifecycles.

Also, this is inconsistent with the existing saving options (save_circle_to_folder, save_layers_to_folder), which are all explicit about what they save. In contrast, this new option is abstract and mixes multiple artifact types under one flag.

It would be better to use a more specific name (e.g., save_quant_artifacts_to_folder) so that the behavior is more explicit, predictable, and consistent with the existing CLI design.

I’m also a bit concerned about the overall save interface as it seems to be growing option by option.

At the moment, the save-related flags do not follow a single clear abstraction: some are named by format (circle), some by granularity (layers), and this new one by a very broad implementation-oriented term (torch artifacts). As more save targets get added, this may become harder to understand and maintain from the CLI side.

It may be worth considering a more structured interface, for example:

one output directory option, and

a separate argument that explicitly lists which artifacts to save

This would likely scale better than continuing to add one save flag per artifact/output type.

One of examples would be like below.

--output_dir ./outputs --save sensitivity,ptq_checkpoint # OR --save model_circle sensitivity ptq_checkpoint

If you don't have enough time for refactoring, you can just add a comment as TODO.

Ahh. Ok. Let it be one folder with multiple options. I'll rework this PR.

stamalakhov self-assigned this Apr 13, 2026

[quantization] Save torch artifacts

750ab2d

This PR saves torch artifacts to be used later in evaluation or quantization. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov force-pushed the save_torch_artifacts branch from 71fc131 to 750ab2d Compare April 13, 2026 09:46

stamalakhov marked this pull request as ready for review April 13, 2026 09:51

stamalakhov requested a review from mhs4670go April 13, 2026 09:51

stamalakhov mentioned this pull request Apr 13, 2026

[quantization] Evaluate fk llama model #626

Open

mhs4670go reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Save torch artifacts#625

[quantization] Save torch artifacts#625
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:save_torch_artifacts

stamalakhov commented Apr 13, 2026 •

edited

Loading

Uh oh!

mhs4670go Apr 13, 2026

Uh oh!

mhs4670go Apr 13, 2026

Uh oh!

stamalakhov Apr 13, 2026 •

edited

Loading

Uh oh!

mhs4670go Apr 13, 2026

Uh oh!

stamalakhov Apr 13, 2026

Uh oh!

mhs4670go Apr 13, 2026

Uh oh!

mhs4670go Apr 13, 2026

Uh oh!

stamalakhov Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stamalakhov commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhs4670go Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

stamalakhov Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhs4670go Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

stamalakhov Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

stamalakhov Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stamalakhov commented Apr 13, 2026 •

edited

Loading

stamalakhov Apr 13, 2026 •

edited

Loading