(venv) root@da3fc99b80cc:/FV/TurboDiffusion# CUDA_LAUNCH_BLOCKING=1 PYTHONPATH=turbodiffusion python turbodiffusion/inference/wan2.1_t2v_infer.py --model
Wan2.1-1.3B --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth --resolution 480p --prompt "A stylish woman walks down a Tokyo street fille
d with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears
sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pede
strians walk about." --num_samples 1 --num_steps 4 --quant_linear --attention_type sagesla --sla_topk 0.1
Megatron-core is not installed.
[03-05 00:57:38|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:74:<module>] Computing embedding for prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
[03-05 00:57:38|INFO|turbodiffusion/rcm/utils/umt5.py:495:__init__] loading checkpoints/models_t5_umt5-xxl-enc-bf16.pth
[03-05 00:57:55|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:79:<module>] Loading DiT model from checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth
[03-05 00:57:55|INFO|turbodiffusion/rcm/networks/wan2pt1.py:802:enable_selective_checkpoint] Enable selective checkpoint with mm_only, for every 1 blocks. Total blocks: 30
[03-05 00:57:58|SUCCESS|turbodiffusion/inference/wan2.1_t2v_infer.py:82:<module>] Successfully loaded DiT model.
[03-05 00:57:59|INFO|turbodiffusion/rcm/tokenizers/wan2pt1.py:592:_video_vae] loading checkpoints/Wan2.1_VAE.pth
[03-05 00:57:59|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:88:<module>] Generating with prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
Sampling: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/FV/TurboDiffusion/turbodiffusion/inference/wan2.1_t2v_infer.py", line 131, in <module>
v_pred = net(x_B_C_T_H_W=x.to(**tensor_kwargs), timesteps_B_T=(t_cur.float() * ones * 1000).to(**tensor_kwargs), **condition).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 698, in forward
x_B_L_D = block(x_B_L_D, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
return self.checkpoint_fn( # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/_compile.py", line 54, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 512, in checkpoint
ret = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 416, in forward
x = cross_attn_ffn(x, context, context_lens, e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 410, in cross_attn_ffn
x = x + self.cross_attn(self.norm3(x), context, context_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 292, in forward
k = self.norm_k(self.k(context)).view(b, -1, n, d)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 442, in forward
return rmsnorm(x.float(), self.weight, self.eps).to(x.dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 173, in rmsnorm
_rms_norm_fwd_fused[(triton.cdiv(M, BLOCK_M),)]( #
File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 370, in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 744, in run
kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
File "/opt/venv/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 713, in __call__
self.launch(gridX, gridY, gridZ, stream, function, self.launch_cooperative_grid, self.launch_pdl,
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
I am running inference on a single 5090, after compiling turbodiffusion and spargeattn from scratch.
I run into the following error when trying to run inference:
I am running inference on a single 5090, after compiling turbodiffusion and spargeattn from scratch.