-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
llama-quant : overlap compute and write with double buffering
#21507
opened Apr 6, 2026 by
nuri-yoo
Loading…
6 tasks done
convert : set "add bos" == True for Gemma 4
python
python script changes
#21500
opened Apr 6, 2026 by
ggerganov
Loading…
docs: add hunyuan-ocr gguf, also add test [no ci]
documentation
Improvements or additions to documentation
examples
#21490
opened Apr 5, 2026 by
ngxson
Loading…
mtmd: fit_params now take into account mmproj
examples
server
#21489
opened Apr 5, 2026 by
ngxson
Loading…
vocab : add byte token handling to BPE detokenizer for Gemma4
#21488
opened Apr 5, 2026 by
aldehir
Loading…
console: fix stripping of \n in multiline input
#21485
opened Apr 5, 2026 by
bipinyadav3175
Loading…
server : handle unsuccessful sink.write in chunked stream provider
examples
server
#21478
opened Apr 5, 2026 by
lainon1
Loading…
server: add null check for context to prevent segfault on init failure
examples
server
#21477
opened Apr 5, 2026 by
Anirudh171202
Loading…
gguf-py: Fix lazy tensor handling for keyword arguments
python
python script changes
#21476
opened Apr 5, 2026 by
lainon1
Loading…
llama-quant: use LLM_KV constants instead of hardcoded strings
#21475
opened Apr 5, 2026 by
lainon1
Loading…
CUDA: make cuda graphs props check faster
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21472
opened Apr 5, 2026 by
am17an
Loading…
ggml : fix repeat_back assert with non-contiguous gradients
ggml
changes relating to the ggml tensor library for machine learning
#21467
opened Apr 5, 2026 by
RealOrko
Loading…
ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#21458
opened Apr 5, 2026 by
LilySu
Loading…
vulkan: Support GGML_TYPE_NVFP4
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#21455
opened Apr 5, 2026 by
jeffbolznv
Loading…
metal : add GATED_LINEAR_ATTN op
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#21452
opened Apr 5, 2026 by
TheTom
Loading…
Gemma 4: move some computations to BF16
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
metal: speed up Qwen3-VL image encoding on large images by ~11%
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#21443
opened Apr 4, 2026 by
Avidanborisov
Loading…
fix(gemma4): handle nullable type arrays
testing
Everything test related
#21433
opened Apr 4, 2026 by
gerstnr
Loading…
vulkan: Tweak Xe2 warptile configuration
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#21431
opened Apr 4, 2026 by
TheBlueMatt
Loading…
mtmd: add Gemma 4 audio conformer encoder support
documentation
Improvements or additions to documentation
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
#21421
opened Apr 4, 2026 by
stephencox-ict
Loading…
9 tasks done
Previous Next
ProTip!
Follow long discussions with comments:>50.