Quan Zhou1* · Shaoqing Zhai1* · Qiang Hu2,† · Jia Chen3 · Qiang Li2 · Zhiwei Wang2,†
1WuHan University of Technology 2Huazhong University of Science and Technology
3Changzhou United Imaging Surgical Co., Ltd.
*co-first author †corresponding author
This work presents Mask-to-Concept (M2C), a efficient fine-tuning strategy for SAM3.
Overview of M2C-based human-in-the-loop annotation system.
- Pixel-space diffusion generation (operating directly in image space, without VAE or latent representations), capable of producing flying-pixel-free point clouds from estimated depth maps.
- Our model integrates the discriminative representation (ViT) into generative modeling (DiT), fully leveraging the strengths of both paradigms.
- Our network architecture is purely transformer-based, containing no convolutional layers.
- Although our model is trained at a fixed resolution of 1024×768, it can flexibly support various input resolutions and aspect ratios during inference.
- 2026-03: code, models, and demo are all released.
- Please refer to the official environment configuration of SAM3 to set up your Python environment.
- Download the SAM3 official weights and put them in the
checkpoint/directory.
Download the following datasets and organize them as follows:
datasets/
├── Kvasir-SEG/
├── ISIC-2017/To perform few-shot evaluation (e.g., 1-shot), split the Kvasir-SEG dataset into a Support Set and a Query Set (Ratio 1:9). Place them in:
datasets/Kavsir-seg/support/datasets/Kavsir-seg/query/
Step 1: Run the Controller
python controller.py --pool_root "datasets/Kavsir-seg/support" --n_shot 1 --few_shotStep 2: Run Evaluation
python test.py --test_pool_root "datasets/Kavsir-seg/query"To simulate the full annotation workflow using the entire dataset (e.g., 5-shot setup), put the full dataset in datasets/Kavsir-seg/ and run:
python controller.py --pool_root "datasets/Kavsir-seg" --n_shot 5We are grateful to the Segment Anything Model 3 (SAM3) team for their code and model release.
