Skip to content

Validate token-pruning ratio and unify keep-count resolution#342

Open
SuperMarioYL wants to merge 2 commits into
Tencent:mainfrom
SuperMarioYL:fix/token-pruning-ratio-validation
Open

Validate token-pruning ratio and unify keep-count resolution#342
SuperMarioYL wants to merge 2 commits into
Tencent:mainfrom
SuperMarioYL:fix/token-pruning-ratio-validation

Conversation

@SuperMarioYL

Copy link
Copy Markdown
Contributor

Summary

Every token-compressor pruning strategy turns a drop ratio into an absolute
keep count via int(round(num_vision_tokens * (1 - ratio))). Today each strategy
does this inline, with its own ad-hoc "retain at least one token" guard — and
none of them validate that ratio actually lies in [0, 1].

That gap is a real defect:

  • ratio > 1 makes 1 - ratio negative, so num_to_keep goes negative and
    flows straight into torch.empty(num_to_keep, ...) (divprune) /
    torch.topk(..., k=min(num_to_keep, n)) (attention_based, vispruner) / the
    pivot math in dart — which raises an opaque CUDA/CPU error deep inside torch,
    far from the actual cause.
  • ratio < 0 over-counts and silently changes the kept-token set.
  • The retain-one guard is inconsistent: hiprune doesn't apply it at all, so a
    benign rounding-to-zero there drops the entire image.

This isn't hypothetical — the shipped config
configs/qwen2_5_vl/pruning/vision_selector_r0.9.yaml
carries ratio: 9 even though the filename says r0.9, so running it today
crashes instead of pruning to 10%.

Changes

  • Add a single shared helper resolve_num_tokens_to_keep(ratio, num_vision_tokens)
    in algorithm/utils/utils.py that:
    • rejects a non-numeric / bool / NaN / out-of-[0,1] ratio with a clear
      [TokenCompressor Error] 'ratio' must be in [0.0, 1.0], got <x> message;
    • computes round(num_vision_tokens * (1 - ratio));
    • applies one uniform "retain at least one token when ratio < 1.0" rule.
  • Route all eight strategies (basic, attention_based, dart, divprune,
    hiprune, idpruner, scope, vision_selector) through the helper, removing
    the duplicated inline blocks. Behaviour is unchanged for valid ratios; hiprune
    gains the retain-one guard for consistency.
  • Fix the ratio: 90.9 typo in the vision_selector config so the example
    actually runs.
  • Add a CPU-only regression test (tests/test_token_pruning_ratio.py) that pins
    the contract — it stubs the torch import so it needs neither a GPU nor model
    weights.

Testing

$ python -m pytest tests/test_token_pruning_ratio.py -q
18 passed

Formatting matches the repo pre-commit hooks: black --line-length=99,
isort --profile=black, and flake8 (E203,E704,W503,W504 ignored) are all clean.

Notes

Out-of-range ratios were never a working path (they crashed or corrupted), so
raising a descriptive ValueError instead is strictly safer and not a behavioural
regression for any valid configuration.

Every token_compressor pruning strategy converts a drop ratio into an
absolute keep count via round(num_vision_tokens * (1 - ratio)), each with
its own ad-hoc 'retain at least one token' guard (missing entirely in
hiprune) and none validating that ratio lies in [0, 1].

An out-of-range ratio therefore flows straight into the keep count: a value
> 1 makes (1 - ratio) negative, yielding a negative num_to_keep that crashes
torch.empty / torch.topk, while a value < 0 over-counts. The shipped config
configs/qwen2_5_vl/pruning/vision_selector_r0.9.yaml even carries ratio: 9
(the filename says r0.9), so this is a live defect, not a hypothetical one.

Add a shared resolve_num_tokens_to_keep(ratio, n) helper that validates the
ratio, computes the keep count, and applies a single uniform retain-one rule,
then route all eight strategies through it. Fix the config typo to 0.9 and add
a CPU-only regression test for the helper's contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant