Skip to content

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135

Open
yfchoco208 wants to merge 1 commit into
swiss-ai:mainfrom
yfchoco208:add-qwen3-tts-voicedesign
Open

Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135
yfchoco208 wants to merge 1 commit into
swiss-ai:mainfrom
yfchoco208:add-qwen3-tts-voicedesign

Conversation

@yfchoco208
Copy link
Copy Markdown
Collaborator

Adds examples/clariden/cli/qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign-vllm-omni.sh, single-node launcher for Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign, serving text-to-speech via vLLM-Omni on Clariden GH200.

Adds images/vllm_qwen3_tts_cuda13/Dockerfile and src/swiss_ai_model_launch/assets/envs/vllm_qwen3_tts_cuda13.toml, a CUDA13 vLLM-Omni TTS environment with vllm==0.20.2, vllm-omni==0.20.0, transformers==5.8.0, and audio dependencies such as FFmpeg, libsndfile, and soundfile.

Adds Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign to src/swiss_ai_model_launch/assets/models.json, an interactive SML catalog entry using vLLM-Omni with --max-model-len 8192 and --gpu-memory-utilization 0.40. VoiceDesign was tested with task_type=VoiceDesign and text instructions rather than preset CustomVoice speakers.

Also adds vllm-omni as a supported framework where required, matching the existing vLLM-Omni serving pattern.

Validated from a clean checkout:

  • sml advanced launch works
  • interactive sml catalog launch works

@yfchoco208 yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch 2 times, most recently from d1fca8c to b7f21f4 Compare May 20, 2026 05:06
Copy link
Copy Markdown
Member

@AryanAhadinia AryanAhadinia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution! We would love to merge your PR after addressing the listed comments! Keep doing the great job!

Please also note that your PR has now conflicts that should be solved prior to merge.

Comment on lines +86 to +89
vllm-omni)
FRAMEWORK_ENV_SETUP="export RAY_CGRAPH_get_timeout=1800; export no_proxy=\"0.0.0.0,\$no_proxy\"; export NO_PROXY=\"0.0.0.0,\$NO_PROXY\""
FRAMEWORK_LAUNCH="vllm serve"
;;
Copy link
Copy Markdown
Member

@AryanAhadinia AryanAhadinia May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line indeed seems redundant to me as it is identical to the vLLM case. We may change python3 -m vllm.entrypoints.openai.api_server with vllm serve as they are identical since the former one is deprecated. Nevertheless, please note that we have massively refactored the codebase in #100 and the template.jinja file is now completely removed. Instead, we are now rendering the job script during the runtime in framework.py.


model: str
framework: Literal["sglang", "vllm"]
framework: Literal["sglang", "vllm", "vllm-omni"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding vLLM OMNI beside vLLM as a new framework should be well justified. In the long vision we have, we would like to have to golden base images for vLLM and SGL (ref: #118). As a result, I would suggest to drop vllm-omni as a new framework for now and just use (--environment/--slurm-environment) to specify which toml file you want to use.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying, I will remove vllm-omni as new framework and stick to using the original vllm

Comment thread images/vllm_qwen3_tts_cuda13/Dockerfile Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to patch the current vLLM image?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify what you mean by “patch the current vLLM image”?

I'm not sure if you meant one of the following:

  1. Use existing Docker vLLM CUDA13 base image if it exists and make vllm_qwen3_tts_cuda13 (derived image) that only adds vllm-omni and audio dependencies.
  2. Modify the current vllm_cuda13 Dockerfile (image) itself to include vllm-omni and audio dependencies.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one. In general, we are working toward keeping the number of images and environment as minimal as possible. So, adding a new image and environment only for a small class of models is not that much aligned with our long-term goals.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked on patching the existing images/vllm_cuda13/Dockerfile.

  • Removed the idea of adding vllm-omni as a new SML framework.
  • Switched the Qwen3-TTS entry to use the existing framework: vllm and made the launch path to use the existing vllm environment pattern.

I found a compatibility issue I found is that adding vllm-omni==0.20.0 on top of the current vllm cuda13 caused import failure because the current image has vllm 0.21.1rc..., when vllm-omni==0.20.0 expects the vLLM 0.20 API layout, so i validated with the following combination:

  • vllm==0.20.2
  • vllm-omni==0.20.0
  • transformers==5.8.0

I tested the patched vllm_cuda13 image using a temporary .sqsh so I did not overwrite the shared image, and tested that Qwen3-TTS VoiceDesign starts and /v1/audio/speech generates WAV output, and normal vLLM text model (swiss-ai/Apertus-8B-Instruct-2509) also starts with --enforce-eager, and /v1/chat/completions returns a valid response.

Since the patch seems to be working with Qwen3-TTS and other models that use the docker image, can I proceed with patching the existing vllm_cuda13 Dockerfile?

But I just want to verify if it is okay to pin the shared vllm cuda13 image to vllm==0.20.2 for compatibility with vllm-omni==0.20.0 or not.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! What I would like to suggest is to create a new Dockerfile vllm_cuda13_v2 and drop your patched Dockerfile there. Since the changes are now minimal, it would be hopefully easy for us to replace the original image with yours.

There are ongoing efforts in fixing some bugs that we have with vLLM (#126) and building golden docker images (#118 and #93). The complete replacement of the current vLLM image with yours will take place after the resolution of the aforementioned issues and PRs.

Please note that the CI pipeline should automatically build the Docker image for your and place it beside the other docker images.

Again, thanks a lot for your great job!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've pushed the updated version, but the two remaining CI failures seem likely to be related to repository CI configuration/key.

For Docker Build vllm_cuda13_v2, the job seem to fail during FirecREST initialization before the image build and gives error message:

requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied. Perhaps you meant https://?

I manually built and tested vllm_cuda13_v2 on Clariden using a temporary sqsh and toml, but the official GitHub image build seems to require repository FirecREST secrets to build vllm_cuda13_v2.sqsh.

For SonarCloud / analyze, the message also suggests a missing token/project permission issue:

Warning: Running this GitHub Action without SONAR_TOKEN is not recommended

Project not found. Please check the 'sonar.projectKey' and 'sonar.organization' properties, the 'SONAR_TOKEN' environment variable, or contact the project administrator to check the permissions of the user the token belongs to

I would greatly appreciate if you can help me solve this issue, whether if this is an issue about the dockerfile or issue about missing secret/key, Thank you!

@AryanAhadinia AryanAhadinia added the model-support Adding support for a new model label May 20, 2026
@yfchoco208 yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from b7f21f4 to 94adb31 Compare May 23, 2026 16:55
@yfchoco208 yfchoco208 force-pushed the add-qwen3-tts-voicedesign branch from 94adb31 to 6510e4a Compare May 23, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-support Adding support for a new model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants