ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149 by LilySu · Pull Request #21458 · ggml-org/llama.cpp

LilySu · 2026-04-05T03:55:36Z

Overview

Overview: Implementation of CPU-only (f32/f16) GGML_OP_GATHER, which enables extracting mask[idx, b] where idx = top_k[i,b] in a single op.

Why DSA needs GGML_OP_GATHER:

DSA needs per-batch indices. Each batch has different top-k indices
currently, ggml_get_rows applies the same indices to all rows, but DSA needs per-batch different indices.
Currently scatter operation only accepts a single scaler value (float c), and not tensor values. Here is its signature: ggml_scatter(ctx, a, ids, float c)

scatter cannot read values from indexed positions, but can only write constants.

Why GGML_OP_FILL wouldn't work: ggml_fill: writes into positions — wrong direction (DSA needs to read).

What It Does:

ggml_gather(src, indices, axis=0)

// dst[i, b] = src[indices[i, b], b]

src:     [n_kv,    n_batch, 1, n_stream]  →  e.g. [100000, 512, 1, 1]
indices: [n_top_k, n_batch, 1, n_stream]  →  e.g. [2048,   512, 1, 1]
dst:     [n_top_k, n_batch, 1, n_stream]  →  e.g. [2048,   512, 1, 1]

ggml_gather(mask, top_k) extracts the mask entries at top_k positions, producing a smaller [n_top_k, n_batch, 1, n_stream] tensor instead of operating on the full [n_kv, n_batch, 1, n_stream] mask.

CPU-only for now following the contribution guideline of CPU first, backend support in follow-ups.

Verification:

Build compiles clean: cmake --build succeeded
All 4 gather tests pass on CPU backend
No trailing whitespace issues

Additional information

This op was identified as an optimization path in PR #21149 (DeepSeek V3.2 DSA support). The current implementation uses simple masking — functional but inefficient, as it still computes attention over all ~100k positions.

GGML_OP_GATHER would enable the extraction approach ggerganov suggested: selecting only top-k K/V rows before attention, which fairydreaming noted requires per-batch indexing that ggml_get_rows doesn't support. Beyond DeepSeek V3.2, this op will benefit other sparse attention architectures like GLM5 — as models increasingly adopt top-k selection for long-context efficiency, ggml_gather becomes foundational infrastructure.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

…gather_f32 in ggml-cpu/ops.cpp

…ard dispatch switch statement in ggml-cpu.c

ggml-gh-bot · 2026-04-05T03:59:24Z

Hi @LilySu, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

jeffbolznv · 2026-04-05T04:22:39Z

I don't know whether it would be a good idea, but could this operation be accomplished with set_rows and a row length of 1?

LilySu · 2026-04-05T19:10:05Z

I manually made the modifications following the Scatter and Gated Delta Net implementations before it following existing patterns. AI assisted in answering my questions about the repo, proofread my implementation, made suggestions.

LilySu · 2026-04-05T20:36:31Z

I don't know whether it would be a good idea, but could this operation be accomplished with set_rows and a row length of 1?

set_rows writes values to indexed positions, in this operation we need to read.

get_rows is the equivalent read operation, but it currently applies the same indices across all batches.

To call get_rows on a per-batch-basis would mean an operation per batch in the compute graph. This means graph topology changes with batch size, which decreases graph reuse opportunities.

The indices tensor of ggml tensor ne must have size 1 in the outermost dimension. Modifying get_rows would change the output tensor shape. GGML tensor dimensions for reference:

ne[0] = innermost, embedding dim
ne[1] = second dim, sequence length / n_kv
ne[2] = third dim,  heads
ne[3] = outermost, batch / stream

When the runtime assertion GGML_ASSERT(b->ne[3] == 1) changes, it would require an update to every backend.
Also, modifying get_rows to handle both same-indices-across-batches and per-batch-indices may make testing for each of these two cases more complicated.

Looking at other frameworks, PyTorch has torch.gather for per-position indices that is distinct from torch.index_select, TensorRT-LLM has gather vs index_select.

LilySu added 4 commits April 4, 2026 19:02

ggml : add GGML_OP_GATHER (enum, declaration, tensor creation)

a64db33

ggml : add ggml_gather in ggml.h and ggml.c and ggml_compute_forward_…

baeab36

…gather_f32 in ggml-cpu/ops.cpp

added GGML_OP_SYMBOL array, updated static_assert in ggml.c, and forw…

9302115

…ard dispatch switch statement in ggml-cpu.c

added test_gather in tests/test-backend-ops.cpp

ae511af

LilySu requested a review from ggerganov as a code owner April 5, 2026 03:55

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149#21458

ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149#21458
LilySu wants to merge 4 commits intoggml-org:masterfrom
LilySu:gg/gather-op

LilySu commented Apr 5, 2026

Uh oh!

ggml-gh-bot bot commented Apr 5, 2026

Uh oh!

jeffbolznv commented Apr 5, 2026

Uh oh!

LilySu commented Apr 5, 2026

Uh oh!

LilySu commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LilySu commented Apr 5, 2026

Overview

Why DSA needs GGML_OP_GATHER:

What It Does:

Verification:

Additional information

Requirements

Uh oh!

ggml-gh-bot bot commented Apr 5, 2026

Uh oh!

jeffbolznv commented Apr 5, 2026

Uh oh!

LilySu commented Apr 5, 2026

Uh oh!

LilySu commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants