eagle3: add qwen3.5 4B 9B 35B-A3B support by 36330 · Pull Request #21437 · ggml-org/llama.cpp

36330 · 2026-04-04T14:35:30Z

Summary

This PR extends the EAGLE3 implementation with recurrent verification-state support.

Compared with the earlier EAGLE3 work, this version keeps target-side verification in a single batched decode path and adds the recurrent state handling needed to make that flow work correctly for hybrid / recurrent models.

Main changes:

add EAGLE3 recurrent round-state APIs
preserve single-batch target verification in speculative-simple
promote the accepted recurrent depth state directly after verification
keep KV cleanup on the existing seq_rm() path
add related Qwen3.5 / Qwen3.5-MoE integration
include small converter / model wiring updates needed by this flow

Details

Core files changed:

examples/speculative-simple/speculative-simple.cpp
include/llama.h
src/llama-context.cpp
src/llama-memory.h
src/llama-memory-hybrid.h
src/llama-memory-hybrid.cpp
src/llama-memory-recurrent.h
src/llama-memory-recurrent.cpp
src/models/qwen35.cpp
src/models/qwen35moe.cpp

Additional related updates:

convert_hf_to_gguf.py
examples/speculative/speculative.cpp
src/models/qwen3vl-moe.cpp

test examples:
Qwen3.5-9B-BF16.gguf
eagle3-qwen3.5-9b-eagle.gguf

draft1 accept = 61.463%
encoded 26 tokens in 0.390 seconds, speed: 66.656 t/s
decoded 259 tokens in 15.686 seconds, speed: 16.512 t/s
无eagle3
[ Prompt: 66.2 t/s | Generation: 9.8 t/s ]
speedup:1.68x

Qwen3.5-4B-Q4_K_M.gguf
eagle3-qwen35-4b-draft-Q4_K_M.gguf

draft2 accept = 53.140%
encoded 26 tokens in 0.080 seconds, speed: 326.052 t/s
decoded 257 tokens in 3.047 seconds, speed: 84.339 t/s
无eagle3
[ Prompt: 437.6 t/s | Generation: 64.2 t/s ]
speedup:1.31x

EAGLE3 is an encoder-decoder based speculative decoding method: - Extracts features from target model at specific layers - Uses feature fusion layer to compress target features - Generates draft tokens with single-layer decoder - Maps draft vocabulary to target vocabulary via d2t tensor Key changes: - Add LLM_ARCH_EAGLE3 architecture - Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp) - Add feature extraction from target model layers - Add g_embeddings handling for decoder input - Add GGML_TENSOR_FLAG_SYNC for GPU synchronization - Add --eagle3 flag for speculative-simple example - Add EAGLE3 model conversion in convert_hf_to_gguf.py

…vert GGUF

ggml-gh-bot · 2026-04-04T14:39:19Z

Hi @36330, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

sorasoras · 2026-04-04T14:50:05Z

how does this compare to MTP?

jacekpoplawski · 2026-04-04T14:56:00Z

"extends the EAGLE3 implementation" - what does it mean? Has EAGLE3 ever been implemented in llama.cpp?

36330 · 2026-04-04T14:58:24Z

EAGLE3 has been implemented in llama.cpp. However, due to the particularity of the qwen3.5 linear attention architecture, some adaptations have been made.

ngxson · 2026-04-04T15:03:47Z

You're pushing to wrong branch, I suppose this is not yet ready

36330 · 2026-04-04T15:05:37Z

I successfully merged the latest version.

jacekpoplawski · 2026-04-04T15:28:03Z

EAGLE3 has been implemented in llama.cpp

this looks like a draft
#18039
(please correct me if I am wrong)

36330 · 2026-04-04T15:35:38Z

EAGLE3 has been implemented in llama.cpp

this looks like a draft #18039 (please correct me if I am wrong)

qwen3.5 not support

jacekpoplawski · 2026-04-04T15:40:36Z

EAGLE3 has been implemented in llama.cpp

this looks like a draft #18039 (please correct me if I am wrong)

qwen3.5 not support

maybe you wanted to push into ichbinhandsome:eagle3-adapt-new-arch instead into master?

ruixiang63 and others added 18 commits December 14, 2025 18:12

fix eagle3 logits sync bug & remove ggml_set_sync()

ac5667d

Merge branch 'master' into pr/18039

3e7f376

eagle3 : improve naming

5a79c19

add eagle3 support for Qwen3 series models

c0d99e6

add eagle3 support for Qwen3 MoE models

71ba283

eagle3: load lm_head from target model if not in draft model when con…

3da288d

…vert GGUF

eagle3: make d2t mapping optional

13a9f31

eagle3: add support for gpt-oss-120B eagle3

75883cd

eagle3: add support for RedHtAI eagle3 speculator series models

7b78bfa

Merge branch 'master' into HEAD

7d4c223

Merge branch 'master' into pr/18039

5e224bc

eagle3: fix model convert issue

b353792

eagle3: fix model convert code format

9fea243

Merge branch 'master' into pr/18039

b8ab2cc

eagle3: support --eagle3 in llama-cli

07e2c97

Merge branch 'master' into pr/18039

5bb2d50

eagle3: add recurrent verification state support

5743828

36330 requested review from a team, CISC and ggerganov as code owners April 4, 2026 14:35

github-actions bot added model Model specific examples python python script changes server labels Apr 4, 2026

ngxson marked this pull request as draft April 4, 2026 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eagle3: add qwen3.5 4B 9B 35B-A3B support#21437

eagle3: add qwen3.5 4B 9B 35B-A3B support#21437
36330 wants to merge 18 commits intoggml-org:masterfrom
36330:pr/eagle3-more

36330 commented Apr 4, 2026

Uh oh!

ggml-gh-bot bot commented Apr 4, 2026

Uh oh!

sorasoras commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

ngxson commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

36330 commented Apr 4, 2026

Summary

Details

Uh oh!

ggml-gh-bot bot commented Apr 4, 2026

Uh oh!

sorasoras commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

ngxson commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

36330 commented Apr 4, 2026

Uh oh!

jacekpoplawski commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants