feat: Use xllamacpp to allow batching tasks and return reasoning content by marcelklehr · Pull Request #258 · nextcloud/llm2

marcelklehr · 2026-06-18T07:30:13Z

Switches llama-cpp-python with xllamacpp a thinner wrapper
Make processing async to allow parallel processing
Return reasoning content for all task types

🤖 AI (if applicable)

The content of this PR was partly or fully generated using AI

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc

I checked out the branch, loaded the venv and ran poetry install. When processing a task, the app throws this:

RuntimeError: llama-server exited with code 1 before becoming ready. Last output:
Traceback (most recent call last):
  File "<string>", line 2, in <module>
    import xllamacpp as xlc
  File "/home/julien/vcs/git/llm2/.venv/lib/python3.14/site-packages/xllamacpp/__init__.py", line 15, in <module>
    from .xllamacpp import *
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

Is CUDA a strong requirement of xllamacpp?
I'm using COMPUTE_DEVICE=CPU btw.

Also, it would be nice to return the reasoning in the non-chat providers as well. Wdyt?

marcelklehr · 2026-06-23T08:57:49Z

Is CUDA a strong requirement of xllamacpp?

It's not but I haven't tested without CUDA, yet. Good point!

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc

With Olmo, on CPU, I get this error with all task types:

0.02.736.601 E srv          init: init: chat template parsing error: Unable to generate parser for this template. Automatic parser generation failed:
------------
While executing FilterExpression at line 6, column 86 in source:
... none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- el...
                                           ^
Error: Unknown (built-in) filter 'tojson' for type Undefined (hint: 'tools')
0.02.736.603 E srv          init: init: please consider disabling jinja via --no-jinja, or use a custom chat template via --chat-template
0.02.736.604 E srv          init: init: for example: --no-jinja --chat-template chatml
0.02.736.617 I srv    operator(): operator(): cleaning up before exit...
0.02.737.377 E srv          init: exiting due to model loading error
Traceback (most recent call last):
  File "<string>", line 15, in <module>
    server = xlc.Server(p)  # starts the C++ server in a background thread
  File "src/xllamacpp/xllamacpp.pyx", line 3070, in xllamacpp.xllamacpp.Server.__cinit__
RuntimeError: Failed to init server, please check the input params.

Because we now stream the reasoning content, the message generation placeholder disappears while there is still no content to display. This will be fixed by adding the reasoning support in the assistant UI but right now it feels weird.

Other than that: works well!

marcelklehr · 2026-06-24T09:16:00Z

Error: Unknown (built-in) filter 'tojson' for type Undefined (hint: 'tools')

Mh, good catch! that seems like a model file incompatibility issue :/

should also fix the prompt template issue with the old olmo version Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc · 2026-06-24T14:37:40Z

No more chat template parsing error with Olmo-Think!

But with Olmo-Think, the reasoning content is reported as content. Tried the same task type with the same prompt with Qwen and the reasoning was reported correctly.

Also, i tried canceling a task and it seems llm2 is not stopping the process after reporting some intermediate output (the response from the /stream-result endpoint contains the task with its status). I don't think i tested that in the PR that added the streaming support. Can you have a look? Bug or just something missing to support cancelling?

marcelklehr · 2026-06-25T08:11:01Z

But with Olmo-Think, the reasoning content is reported as content.

Damn, confirmed. Mmmh, so either we can't use Olmo at all or only with Reasoning spilling out. :/

canceling a task and it seems llm2 is not stopping the process after reporting some intermediate output

Ah, yes , that was not implemented. I can create a new PR once this is through.

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

marcelklehr · 2026-06-25T09:37:25Z

Olmo 3 Instruct should work now

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

marcelklehr changed the title ~~feat: Use llama-cpp-server to allow batching tasks~~ feat: Use llama-cpp-server to allow batching tasks and return reasoning content Jun 22, 2026

marcelklehr changed the title ~~feat: Use llama-cpp-server to allow batching tasks and return reasoning content~~ feat: Use xllamacpp to allow batching tasks and return reasoning content Jun 22, 2026

marcelklehr marked this pull request as ready for review June 22, 2026 12:06

marcelklehr added 7 commits June 22, 2026 14:08

feat: Use llama-cpp-server to allow batching tasks

0502b48

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix: Fix health check and set batch size correctly

baa6bf8

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

feat: Make processing async

81922de

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

feat: Migrate from llama-cpp-python to xllamacpp

1f67ac7

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix: Do not set tasks to running if they're only still queued

a46be66

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix: Improve xllamacpp error handling

99665fb

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

feat: Provide reasoning content for chat task types

975a562

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

marcelklehr force-pushed the feat/llama-cpp-server branch from b7253d7 to 975a562 Compare June 22, 2026 12:08

marcelklehr added 2 commits June 22, 2026 14:12

chore: Update poetry lock file

307639f

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix(ci): Use latest app_api version

38dd466

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc reviewed Jun 23, 2026

View reviewed changes

marcelklehr added 5 commits June 23, 2026 14:57

fix: Catch errors in the task loop and refresh processors upon init

eadb77e

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix(ci): Make sure persistent_storage dir exists

758c18c

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix(ci): Use the xllamacpp cpu wheel in CI

b8eb2cf

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

feat: Add cuda and rocm docker builds

581ff68

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix(ci): Install cpu build of xllamacpp correctly

c7c3043

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

marcelklehr force-pushed the feat/llama-cpp-server branch from e385a3e to c7c3043 Compare June 24, 2026 07:48

marcelklehr added 3 commits June 24, 2026 09:51

fix(ci): Also test on stable34

d6e5c2f

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix(ci): Upgrade python

ba1be60

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

feat: Add reasoning output to all processors

2fd471b

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc reviewed Jun 24, 2026

View reviewed changes

feat: Switch to Olmo 3 Think

9e7c153

should also fix the prompt template issue with the old olmo version Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix: Switch back to Olmo 3 Instruct and fix chat template

5cddedd

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

fix: Improve tool parser support for Olmo and Qwen 3.5

715449e

Signed-off-by: Marcel Klehr <mklehr@gmx.net>

julien-nc self-requested a review June 25, 2026 12:19

marcelklehr merged commit 1a391f3 into main Jun 25, 2026
8 checks passed

marcelklehr deleted the feat/llama-cpp-server branch June 25, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Use xllamacpp to allow batching tasks and return reasoning content#258

feat: Use xllamacpp to allow batching tasks and return reasoning content#258
marcelklehr merged 20 commits into
mainfrom
feat/llama-cpp-server

marcelklehr commented Jun 18, 2026 •

edited

Loading

Uh oh!

julien-nc left a comment •

edited

Loading

Uh oh!

marcelklehr commented Jun 23, 2026

Uh oh!

julien-nc left a comment •

edited

Loading

Uh oh!

marcelklehr commented Jun 24, 2026

Uh oh!

julien-nc commented Jun 24, 2026 •

edited

Loading

Uh oh!

marcelklehr commented Jun 25, 2026

Uh oh!

marcelklehr commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

marcelklehr commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 AI (if applicable)

Uh oh!

julien-nc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcelklehr commented Jun 23, 2026

Uh oh!

julien-nc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcelklehr commented Jun 24, 2026

Uh oh!

julien-nc commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcelklehr commented Jun 25, 2026

Uh oh!

marcelklehr commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcelklehr commented Jun 18, 2026 •

edited

Loading

julien-nc left a comment •

edited

Loading

julien-nc left a comment •

edited

Loading

julien-nc commented Jun 24, 2026 •

edited

Loading