Add Qwen3-TTS VoiceDesign vLLM-Omni launcher#135
Conversation
d1fca8c to
b7f21f4
Compare
| vllm-omni) | ||
| FRAMEWORK_ENV_SETUP="export RAY_CGRAPH_get_timeout=1800; export no_proxy=\"0.0.0.0,\$no_proxy\"; export NO_PROXY=\"0.0.0.0,\$NO_PROXY\"" | ||
| FRAMEWORK_LAUNCH="vllm serve" | ||
| ;; |
There was a problem hiding this comment.
This line indeed seems redundant to me as it is identical to the vLLM case. We may change python3 -m vllm.entrypoints.openai.api_server with vllm serve as they are identical since the former one is deprecated. Nevertheless, please note that we have massively refactored the codebase in #100 and the template.jinja file is now completely removed. Instead, we are now rendering the job script during the runtime in framework.py.
|
|
||
| model: str | ||
| framework: Literal["sglang", "vllm"] | ||
| framework: Literal["sglang", "vllm", "vllm-omni"] |
There was a problem hiding this comment.
Adding vLLM OMNI beside vLLM as a new framework should be well justified. In the long vision we have, we would like to have to golden base images for vLLM and SGL (ref: #118). As a result, I would suggest to drop vllm-omni as a new framework for now and just use (--environment/--slurm-environment) to specify which toml file you want to use.
There was a problem hiding this comment.
Thank you for clarifying, I will remove vllm-omni as new framework and stick to using the original vllm
There was a problem hiding this comment.
Isn't it possible to patch the current vLLM image?
There was a problem hiding this comment.
Just to clarify what you mean by “patch the current vLLM image”?
I'm not sure if you meant one of the following:
- Use existing Docker vLLM CUDA13 base image if it exists and make
vllm_qwen3_tts_cuda13(derived image) that only adds vllm-omni and audio dependencies. - Modify the current
vllm_cuda13Dockerfile (image) itself to include vllm-omni and audio dependencies.
There was a problem hiding this comment.
The second one. In general, we are working toward keeping the number of images and environment as minimal as possible. So, adding a new image and environment only for a small class of models is not that much aligned with our long-term goals.
There was a problem hiding this comment.
I worked on patching the existing images/vllm_cuda13/Dockerfile.
- Removed the idea of adding
vllm-omnias a new SML framework. - Switched the Qwen3-TTS entry to use the existing
framework: vllmand made the launch path to use the existing vllm environment pattern.
I found a compatibility issue I found is that adding vllm-omni==0.20.0 on top of the current vllm cuda13 caused import failure because the current image has vllm 0.21.1rc..., when vllm-omni==0.20.0 expects the vLLM 0.20 API layout, so i validated with the following combination:
vllm==0.20.2vllm-omni==0.20.0transformers==5.8.0
I tested the patched vllm_cuda13 image using a temporary .sqsh so I did not overwrite the shared image, and tested that Qwen3-TTS VoiceDesign starts and /v1/audio/speech generates WAV output, and normal vLLM text model (swiss-ai/Apertus-8B-Instruct-2509) also starts with --enforce-eager, and /v1/chat/completions returns a valid response.
Since the patch seems to be working with Qwen3-TTS and other models that use the docker image, can I proceed with patching the existing vllm_cuda13 Dockerfile?
But I just want to verify if it is okay to pin the shared vllm cuda13 image to vllm==0.20.2 for compatibility with vllm-omni==0.20.0 or not.
There was a problem hiding this comment.
Thanks a lot! What I would like to suggest is to create a new Dockerfile vllm_cuda13_v2 and drop your patched Dockerfile there. Since the changes are now minimal, it would be hopefully easy for us to replace the original image with yours.
There are ongoing efforts in fixing some bugs that we have with vLLM (#126) and building golden docker images (#118 and #93). The complete replacement of the current vLLM image with yours will take place after the resolution of the aforementioned issues and PRs.
Please note that the CI pipeline should automatically build the Docker image for your and place it beside the other docker images.
Again, thanks a lot for your great job!
There was a problem hiding this comment.
Hi, I've pushed the updated version, but the two remaining CI failures seem likely to be related to repository CI configuration/key.
For Docker Build vllm_cuda13_v2, the job seem to fail during FirecREST initialization before the image build and gives error message:
requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied. Perhaps you meant https://?
I manually built and tested vllm_cuda13_v2 on Clariden using a temporary sqsh and toml, but the official GitHub image build seems to require repository FirecREST secrets to build vllm_cuda13_v2.sqsh.
For SonarCloud / analyze, the message also suggests a missing token/project permission issue:
Warning: Running this GitHub Action without SONAR_TOKEN is not recommended
Project not found. Please check the 'sonar.projectKey' and 'sonar.organization' properties, the 'SONAR_TOKEN' environment variable, or contact the project administrator to check the permissions of the user the token belongs to
I would greatly appreciate if you can help me solve this issue, whether if this is an issue about the dockerfile or issue about missing secret/key, Thank you!
b7f21f4 to
94adb31
Compare
94adb31 to
6510e4a
Compare
Adds
examples/clariden/cli/qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign-vllm-omni.sh, single-node launcher forQwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign, serving text-to-speech via vLLM-Omni on Clariden GH200.Adds
images/vllm_qwen3_tts_cuda13/Dockerfileandsrc/swiss_ai_model_launch/assets/envs/vllm_qwen3_tts_cuda13.toml, a CUDA13 vLLM-Omni TTS environment withvllm==0.20.2,vllm-omni==0.20.0,transformers==5.8.0, and audio dependencies such as FFmpeg, libsndfile, and soundfile.Adds
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesigntosrc/swiss_ai_model_launch/assets/models.json, an interactive SML catalog entry using vLLM-Omni with--max-model-len 8192and--gpu-memory-utilization 0.40. VoiceDesign was tested withtask_type=VoiceDesignand text instructions rather than preset CustomVoice speakers.Also adds
vllm-omnias a supported framework where required, matching the existing vLLM-Omni serving pattern.Validated from a clean checkout:
sml advancedlaunch workssmlcatalog launch works