Eval bug: Xid 8 gpu timeout Crash during code with multi GPU inference, DFlash

### Name and Version

/home/zach/dev/beellama.cpp/build/bin/llama-cli --version
version: 10146 (e623d3984)
built with GNU 13.3.0 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

2x 5060ti 16GB 
Gigabyte B650 x ax v2, 1 in main slot (Gen 4 x16, running at x8) other in secondary slot (gen 3 x1)

### Models

https://huggingface.co/Freenixi/Abiray-Qwen3.6-27B-NVFP4-GGUF

### Problem description & steps to reproduce

Issue is a crash during decoding

Command: /home/zach/dev/beellama.cpp/build/bin/llama-server -m "/media/zach/E7562A2674DB25F7/dev/models/Abiray-Qwen3.6-27B-NVFP4.gguf" --mmproj /media/zach/E7562A2674DB25F7/dev/models/Abiray-NVFP4-mmproj.gguf --port 8018 -c 125000 --batch-size 2048 --ubatch-size 512 --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence_penalty 0.0 --repeat_penalty 1.0 --jinja --chat-template-file /home/zach/Downloads/chat_template.jinja --no-mmap --mlock --no-host -np 1 -ctk q8_0 -ctv q8_0 --host 0.0.0.0 --chat-template-kwargs '{"preserve_thinking":true}' --metrics --timeout 600000 --ctx-checkpoints 64 --spec-draft-model /media/zach/E7562A2674DB25F7/dev/models/Qwen3.6-27B-DFlash-Q5_k_m.gguf --spec-type dflash --spec-dflash-cross-ctx 1024 --spec-draft-ngl all -kvu -ngl all --spec-draft-device CUDA0 -ts .34,.66

Happens during decode, more likely to occur during a large decode (could just be because of the odds increasing by running longer though)

Seems to always happen on the second GPU

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
Server logs:
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.193.997 I slot print_timing: id  0 | task 6145 | n_decoded =  10567, tg =  53.10 t/s
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.523 E CUDA error: the launch timed out and was terminated
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.526 E   current device: 0, in function ggml_backend_cuda_synchronize at /home/zach/dev/beellama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3298
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: 15.50.516.527 E   cudaStreamSynchronize(cuda_ctx->stream())
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[126310]: /home/zach/dev/beellama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:105: CUDA error
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126340]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126339]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126338]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126337]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126336]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126335]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126334]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126333]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126332]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126331]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126330]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126329]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126328]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126327]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126326]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126325]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126324]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126323]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126322]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126321]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126320]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126313]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [New LWP 126312]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: This GDB supports auto-downloading debuginfo from the following URLs:
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]:   <https://debuginfod.ubuntu.com>
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Debuginfod has been disabled.
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [Thread debugging using libthread_db enabled]
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: 0x0000752078710813 in __GI___wait4 (pid=147820, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 30        ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #0  0x0000752078710813 in __GI___wait4 (pid=147820, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: 30        in ../sysdeps/unix/sysv/linux/wait4.c
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #1  0x0000752078d28903 in ggml_print_backtrace () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #2  0x0000752078d28aab in ggml_abort () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #3  0x0000752071c17297 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) () from /home/zach/dev/beellama.cpp/build/bin/libggml-cuda.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #4  0x0000752071c18988 in ggml_backend_cuda_synchronize(ggml_backend*) () from /home/zach/dev/beellama.cpp/build/bin/libggml-cuda.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #5  0x0000752078d42d1e in ggml_backend_sched_synchronize () from /home/zach/dev/beellama.cpp/build/bin/libggml-base.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #6  0x0000752077d0df34 in llama_context::decode(llama_batch const&) () from /home/zach/dev/beellama.cpp/build/bin/libllama.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #7  0x0000752077d1195f in llama_decode () from /home/zach/dev/beellama.cpp/build/bin/libllama.so.0
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #8  0x0000752078f88e5e in server_context_impl::update_slots() () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #9  0x0000752079030901 in server_queue::start_loop(long) () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #10 0x0000752078ed050c in llama_server(int, char**) () from /home/zach/dev/beellama.cpp/build/bin/libllama-server-impl.so
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #11 0x000075207862a1ca in __libc_start_call_main (main=main@entry=0x647d654a8270 <main>, argc=argc@entry=61, argv=argv@entry=0x7fff02c18f58) at ../sysdeps/nptl/libc_start_call_main.h:58
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 58        ../sysdeps/nptl/libc_start_call_main.h: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #12 0x000075207862a28b in __libc_start_main_impl (main=0x647d654a8270 <main>, argc=61, argv=0x7fff02c18f58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff02c18f48) at ../csu/libc-start.c:360
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: warning: 360        ../csu/libc-start.c: No such file or directory
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: #13 0x0000647d654a82a5 in _start ()
Jun 10 06:58:10 zach-B650-GAMING-X-AX-V2 llama-server[147820]: [Inferior 1 (process 126310) detached]
Jun 10 06:58:12 zach-B650-GAMING-X-AX-V2 systemd[2122]: llama-server.service: Main process exited, code=dumped, status=6/ABRT
Jun 10 06:58:12 zach-B650-GAMING-X-AX-V2 systemd[2122]: llama-server.service: Failed with result 'core-dump'.

dmesg logs:
[ 3741.043198] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 3741.048033] NVRM: Xid (PCI:0000:05:00): 8, pid=89904, name=llama-server, channel 0x00000005
[ 4229.453345] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 4229.457373] NVRM: Xid (PCI:0000:05:00): 8, pid=115763, name=llama-server, channel 0x00000003
[ 5183.804561] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 5183.809093] NVRM: Xid (PCI:0000:05:00): 8, pid=126310, name=llama-server, channel 0x00000005
[ 5815.603405] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[ 5815.608015] NVRM: Xid (PCI:0000:05:00): 8, pid=147910, name=llama-server, channel 0x00000003
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Xid 8 gpu timeout Crash during code with multi GPU inference, DFlash #65

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Xid 8 gpu timeout Crash during code with multi GPU inference, DFlash #65

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions