Skip to content

Illegal Memory Access on 5090 #119

@loaydatrain

Description

@loaydatrain

I run into the following error when trying to run inference:

(venv) root@da3fc99b80cc:/FV/TurboDiffusion# CUDA_LAUNCH_BLOCKING=1 PYTHONPATH=turbodiffusion python turbodiffusion/inference/wan2.1_t2v_infer.py     --model 
Wan2.1-1.3B     --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth     --resolution 480p     --prompt "A stylish woman walks down a Tokyo street fille
d with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears 
sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pede
strians walk about."     --num_samples 1     --num_steps 4     --quant_linear     --attention_type sagesla     --sla_topk 0.1
Megatron-core is not installed.
[03-05 00:57:38|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:74:<module>] Computing embedding for prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
[03-05 00:57:38|INFO|turbodiffusion/rcm/utils/umt5.py:495:__init__] loading checkpoints/models_t5_umt5-xxl-enc-bf16.pth
[03-05 00:57:55|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:79:<module>] Loading DiT model from checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth
[03-05 00:57:55|INFO|turbodiffusion/rcm/networks/wan2pt1.py:802:enable_selective_checkpoint] Enable selective checkpoint with mm_only, for every 1 blocks. Total blocks: 30
[03-05 00:57:58|SUCCESS|turbodiffusion/inference/wan2.1_t2v_infer.py:82:<module>] Successfully loaded DiT model.
[03-05 00:57:59|INFO|turbodiffusion/rcm/tokenizers/wan2pt1.py:592:_video_vae] loading checkpoints/Wan2.1_VAE.pth
[03-05 00:57:59|INFO|turbodiffusion/inference/wan2.1_t2v_infer.py:88:<module>] Generating with prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
Sampling:   0%|                                                                                                                         | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/FV/TurboDiffusion/turbodiffusion/inference/wan2.1_t2v_infer.py", line 131, in <module>
    v_pred = net(x_B_C_T_H_W=x.to(**tensor_kwargs), timesteps_B_T=(t_cur.float() * ones * 1000).to(**tensor_kwargs), **condition).to(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 698, in forward
    x_B_L_D = block(x_B_L_D, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return self.checkpoint_fn(  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 512, in checkpoint
    ret = function(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 416, in forward
    x = cross_attn_ffn(x, context, context_lens, e)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 410, in cross_attn_ffn
    x = x + self.cross_attn(self.norm3(x), context, context_lens)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/rcm/networks/wan2pt1.py", line 292, in forward
    k = self.norm_k(self.k(context)).view(b, -1, n, d)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 442, in forward
    return rmsnorm(x.float(), self.weight, self.eps).to(x.dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/FV/TurboDiffusion/turbodiffusion/ops/core.py", line 173, in rmsnorm
    _rms_norm_fwd_fused[(triton.cdiv(M, BLOCK_M),)](  #
  File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 370, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/triton/runtime/jit.py", line 744, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
  File "/opt/venv/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 713, in __call__
    self.launch(gridX, gridY, gridZ, stream, function, self.launch_cooperative_grid, self.launch_pdl,
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

I am running inference on a single 5090, after compiling turbodiffusion and spargeattn from scratch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions