Skip to content

Commit a99f503

Browse files
Remove unused mip functions + fix multi-gpu test (#660)
## What does this PR do? **Type of change:** Improvement <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> - Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue - Remove unused constrain_search_space.py - Remove `is_multi_layer_puzzle: False` case - Remove `use_greedy_search: False` case - Remove knapsack mip case - Remove unused `num_solutions` and `minimal_diversity` flags ## Testing <!-- Mention how have you tested your change if applicable. --> - GH CICD test passing - Tested on 2-gpu setup locally as well <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes * **Refactor** * Optimized solver implementation with improved library integration. * Simplified model compression configuration by removing deprecated search options. * Consolidated optimization paths for streamlined processing. * **Chores** * Updated dependencies for improved compatibility. * **Documentation** * Clarified Model-Optimizer installation instructions in examples. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <[email protected]>
1 parent a1f63bc commit a99f503

File tree

15 files changed

+82
-986
lines changed

15 files changed

+82
-986
lines changed

examples/compress/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ In this example, we compress the [meta-llama/Llama-3.1-8B-Instruct](https://hugg
1313

1414
## Environment
1515

16-
- Install TensorRT-Model-Optimizer in editable mode with the corresponding dependencies:
16+
- Install Model-Optimizer in editable mode with the corresponding dependencies:
1717

1818
```bash
1919
pip install -e .[hf,compress]
@@ -94,7 +94,7 @@ pip install -e .[hf,compress]
9494
block_29: attention gqa_4 ffn intermediate_14336
9595
block_30: attention gqa_4 ffn intermediate_14336
9696
block_31: attention gqa_4 ffn intermediate_14336
97-
97+
9898
[2025-11-02 04:53:11,332]^[[92m[rank-0]^[[0m[run_puzzle.py:295] Total costs: {'stats.memory_mib': 75796.4140625, 'stats.ffn_num_params': 5637275648, 'stats.num_kv_heads': 160, 'stats.kv_cache_memory_mib': 61440.0, 'stats.ffn_memory_mib': 10752.25, 'stats.attention_memory_mib': 63040.15625, 'stats.attention_num_params': 838942720, 'stats.num_params': 7526895616, 'stats.has_attention': 20, 'stats.has_ffn': 32}
9999
...
100100
################################################################

examples/compress/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ defaults:
99
puzzle_dir: ???
1010
teacher_dir: ${puzzle_dir}/ckpts/teacher/
1111
replacement_library_path: ${puzzle_dir}/replacement_library.json
12-
dataset_path: ??? # path to v0.4_mini
12+
dataset_path: ??? # path to v0.4_mini
1313

1414
skip_realize_model: false
1515

@@ -21,10 +21,10 @@ calc_subblock_stats:
2121
batch_sizes: [64, 96, 128]
2222
prefill_seq_len: 4096
2323
generation_seq_len: 4096
24-
num_active_tokens_override: # Optional override for sequence lengths
24+
num_active_tokens_override: # Optional override for sequence lengths
2525
prefill_queue_size: 0
2626
allocate_prefill_query: false
27-
benchmark_iterations: # Set to a number (e.g., 1000) to enable runtime benchmarking
27+
benchmark_iterations: # Set to a number (e.g., 1000) to enable runtime benchmarking
2828
merge_with_existing_stats: false
2929
subblock_stats_filename: "subblock_stats.json"
3030
moe_stats_filename: "moe_stats.json"
@@ -56,8 +56,6 @@ mip:
5656
# puzzle_profile:
5757
objective: metrics.cosine_embedding_loss_hidden_states
5858
bigger_is_better: false
59-
num_solutions: 1
60-
minimal_diversity: 2
6159

6260
subblock_stats_args:
6361
- batch_size: 96
@@ -81,21 +79,18 @@ mip:
8179
target_memory: 78_000
8280

8381
mip_constraints:
84-
use_greedy_search: false
85-
is_multi_layer_puzzle: true
8682
metric_overrides:
87-
constrain_search_func:
8883
max_seconds_per_solution: 60
8984

9085
realize_model:
9186
teacher_dir: ${to_path:${teacher_dir}}
9287
tokenizer_name: ${to_path:${teacher_dir}}
9388
replacement_library_path: ${replacement_library_path}
9489
save_models: true
95-
solutions_path: # Filled dynamically
90+
solutions_path: # Filled dynamically
9691

9792
# Validate params
98-
skip_validation: false # To enable validation of the model solution set `skip_validation` as False
93+
skip_validation: false # To enable validation of the model solution set `skip_validation` as False
9994
eval_samples: 128
10095
micro_batch_size: 1
10196
seed: 42

examples/pruning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This section focuses on applying Model Optimizer's state-of-the-art complementar
2323

2424
</div>
2525

26-
For more advanced pruning strategies, such as the [Puzzle methodology](https://arxiv.org/pdf/2411.19146), please see [Puzzle pruning example](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/feature/compress/examples/compress).
26+
For more advanced pruning strategies, such as the [Puzzle methodology](https://arxiv.org/pdf/2411.19146), please see [Puzzle pruning example](../compress/README.md).
2727

2828
## Pre-Requisites
2929

0 commit comments

Comments
 (0)