Skip to content

Eval bug: Xid 8 gpu timeout Crash during code with multi GPU inference, DFlash #65

@tangeman21

Description

@tangeman21

Name and Version

/home/zach/dev/beellama.cpp/build/bin/llama-cli --version
version: 10146 (e623d39)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

2x 5060ti 16GB
Gigabyte B650 x ax v2, 1 in main slot (Gen 4 x16, running at x8) other in secondary slot (gen 3 x1)

Models

https://huggingface.co/Freenixi/Abiray-Qwen3.6-27B-NVFP4-GGUF

Problem description & steps to reproduce

Issue is a crash during decoding

Command: /home/zach/dev/beellama.cpp/build/bin/llama-server -m "/media/zach/E7562A2674DB25F7/dev/models/Abiray-Qwen3.6-27B-NVFP4.gguf" --mmproj /media/zach/E7562A2674DB25F7/dev/models/Abiray-NVFP4-mmproj.gguf --port 8018 -c 125000 --batch-size 2048 --ubatch-size 512 --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence_penalty 0.0 --repeat_penalty 1.0 --jinja --chat-template-file /home/zach/Downloads/chat_template.jinja --no-mmap --mlock --no-host -np 1 -ctk q8_0 -ctv q8_0 --host 0.0.0.0 --chat-template-kwargs '{"preserve_thinking":true}' --metrics --timeout 600000 --ctx-checkpoints 64 --spec-draft-model /media/zach/E7562A2674DB25F7/dev/models/Qwen3.6-27B-DFlash-Q5_k_m.gguf --spec-type dflash --spec-dflash-cross-ctx 1024 --spec-draft-ngl all -kvu -ngl all --spec-draft-device CUDA0 -ts .34,.66

Happens during decode, more likely to occur during a large decode (could just be because of the odds increasing by running longer though)

Seems to always happen on the second GPU

First Bad Commit

No response

Relevant log output

Logs
Server logs:
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.193.997 I slot print_timing: id  0 | task 6145 | n_decoded =  10567, tg =  53.10 t/s
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.523 E CUDA error: the launch timed out and was terminated
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.526 E   current device: 0, in function ggml_backend_cuda_synchronize at /home/zach/dev/beellama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3298
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.527 E   cudaStreamSynchronize(cuda_ctx->stream())
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: /home/zach/dev/beellama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:105: CUDA error
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126340]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126339]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126338]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126337]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126336]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126335]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126334]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126333]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126332]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126331]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126330]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126329]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126328]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126327]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126326]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126325]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126324]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126323]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126322]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126321]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126320]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126313]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126312]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: This GDB supports auto-downloading debuginfo from the following URLs:
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]:   <https://debuginfod.ubuntu.com>
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Debuginfod has been disabled.
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [Thread debugging using libthread_db enabled]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: 0x0000752078710813 in __GI___wait4 (pid=147820, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 30        ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #0  0x0000752078710813 in __GI___wait4 (pid=147820, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: 30        in ../sysdeps/unix/sysv/linux/wait4.c
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #1  0x0000752078d28903 in ggml_print_backtrace () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #2  0x0000752078d28aab in ggml_abort () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #3  0x0000752071c17297 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) () from /home/zach/dev/beellama.cpp/build/bin/libggml-cuda.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #4  0x0000752071c18988 in ggml_backend_cuda_synchronize(ggml_backend*) () from /home/zach/dev/beellama.cpp/build/bin/libggml-cuda.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #5  0x0000752078d42d1e in ggml_backend_sched_synchronize () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #6  0x0000752077d0df34 in llama_context::decode(llama_batch const&) () from /home/zach/dev/beellama.cpp/build/bin/libllama.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #7  0x0000752077d1195f in llama_decode () from /home/zach/dev/beellama.cpp/build/bin/libllama.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #8  0x0000752078f88e5e in server_context_impl::update_slots() () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #9  0x0000752079030901 in server_queue::start_loop(long) () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #10 0x0000752078ed050c in llama_server(int, char**) () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #11 0x000075207862a1ca in __libc_start_call_main (main=main@entry=0x647d654a8270 <main>, argc=argc@entry=61, argv=argv@entry=0x7fff02c18f58) at ../sysdeps/nptl/libc_start_call_main.h:58
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 58        ../sysdeps/nptl/libc_start_call_main.h: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #12 0x000075207862a28b in __libc_start_main_impl (main=0x647d654a8270 <main>, argc=61, argv=0x7fff02c18f58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff02c18f48) at ../csu/libc-start.c:360
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 360        ../csu/libc-start.c: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #13 0x0000647d654a82a5 in _start ()
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [Inferior 1 (process 126310) detached]
Jun 10 06:58:12 zach-B650-GAMING-X-AX-V2 systemd[2122]: llama-server.service: Main process exited, code=dumped, status=6/ABRT
Jun 10 06:58:12 zach-B650-GAMING-X-AX-V2 systemd[2122]: llama-server.service: Failed with result 'core-dump'.

dmesg logs:
[ 3741.043198] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 3741.048033] NVRM: Xid (PCI:0000:05:00): 8, pid=89904, name=llama-server, channel 0x00000005
[ 4229.453345] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 4229.457373] NVRM: Xid (PCI:0000:05:00): 8, pid=115763, name=llama-server, channel 0x00000003
[ 5183.804561] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 5183.809093] NVRM: Xid (PCI:0000:05:00): 8, pid=126310, name=llama-server, channel 0x00000005
[ 5815.603405] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 5815.608015] NVRM: Xid (PCI:0000:05:00): 8, pid=147910, name=llama-server, channel 0x00000003

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions