Illegal Memory Access on 5090

I run into the following error when trying to run inference:

```
(venv) root@da3fc99b80cc:/FV/TurboDiffusion# CUDA_LAUNCH_BLOCKING=1 PYTHONPATH=turbodiffusion python turbodiffusion/inference/wan2.1_t2v_infer.py     --model 
Wan2.1-1.3B     --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth     --resolution 480p     --prompt "A stylish woman walks down a Tokyo street fille
d with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears 
sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pede
strians walk about."     --num_samples 1     --num_steps 4     --quant_linear     --attention_type sagesla     --sla_topk 0.1
Megatron-core is not installed.
[03-05 00:57:38|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:74:<module>] Computing embedding for prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
[03-05 00:57:38|INFO|turbodiffusion/rcm/utils/umt5.py:495:__init__] loading checkpoints/models_t5_umt5-xxl-enc-bf16.pth
[03-05 00:57:55|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:79:<module>] Loading DiT model from checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth
[03-05 00:57:55|INFO|turbodiffusion/rcm/networks/wan2pt1.py:802:enable_selective_checkpoint] Enable selective checkpoint with mm_only, for every 1 blocks. Total blocks: 30
[03-05 00:57:58|SUCCESS|turbodiffusion/inference/wan2.1_t2v_infer.py:82:<module>] Successfully loaded DiT model.
[03-05 00:57:59|INFO|turbodiffusion/rcm/tokenizers/wan2pt1.py:592:_video_vae] loading checkpoints/Wan2.1_VAE.pth
[03-05 00:57:59|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:88:<module>] Generating with prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
Sampling:   0%|                                                                                                                         | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/FV/TurboDiffusion/turbodiffusion/inference/wan2.1_t2v_infer.py", line 131, in <module>
    v_pred = net(x_B_C_T_H_W=x.to(**tensor_kwargs), timesteps_B_T=(t_cur.float() * ones * 1000).to(**tensor_kwargs), **condition).to(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 698, in forward
    x_B_L_D = block(x_B_L_D, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return self.checkpoint_fn(  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 512, in checkpoint
    ret = function(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 416, in forward
    x = cross_attn_ffn(x, context, context_lens, e)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 410, in cross_attn_ffn
    x = x + self.cross_attn(self.norm3(x), context, context_lens)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 292, in forward
    k = self.norm_k(self.k(context)).view(b, -1, n, d)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 442, in forward
    return rmsnorm(x.float(), self.weight, self.eps).to(x.dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 173, in rmsnorm
    _rms_norm_fwd_fused[(triton.cdiv(M, BLOCK_M),)](  #
  File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 370, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 744, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
  File "/opt/venv/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 713, in __call__
    self.launch(gridX, gridY, gridZ, stream, function, self.launch_cooperative_grid, self.launch_pdl,
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
```

I am running inference on a single 5090, after compiling turbodiffusion and spargeattn from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal Memory Access on 5090 #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Illegal Memory Access on 5090 #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions