Infill endpoint, allow for specialized extra_input to be added to the tail without risk of truncation. #21415

bogzbonny · 2026-04-04T07:03:48Z

bogzbonny
Apr 4, 2026

I'm attempting to figure out the best way to add more specialized-context to some of my FIM prompts.

This specialized-context could be a variety of information which would be directly helpful for this specific FIM request, but maybe not useful for other FIM requests in different locations.

I considered that this specialized-context could be added into the prefix, however then it risks being truncated due to the batch size... so that doesn't make much sense.

AFAICT the best way to feed in this information is through an extra_input field [call this "special_extra_input"], and I could add this information to the END of the request extra_inputs array (which as I understand means that it gets added to the very start of the llm prompt) - HOWEVER, and please correct me if I'm wrong, if for the next prompt I remove the "special_extra_input", it will effectively reset the caching slowing future prompts down.

What I really want is a way to add some extra content to the TAIL of the extra_content in the prompt so if I then remove it on the next prompt it doesn't effect the caching at all. I COULD simply add this extra prompt the beginning of the extra-input-array in the request information which would then put it at the end of the prompt like I want, however then I think I risk this information being truncated? (am I wrong about this??)

(1) Normal prompt:

Request: ExtInp1 ExtInp2 ExtInp3 prefix suffix
LLM text: ExtInp3 ExtInp2 ExtInp1 prefix suffix

(2) Prompt with special-context

Request: SpecialExtInp ExtInp1 ExtInp2 ExtInp3 prefix suffix
LLM text: ExtInp3 ExtInp2 ExtInp1 SpecialExtInp prefix suffix
LLM text if truncation was required due to long context:
ExtInp3 ExtInp2 SpecialExtInp prefix suffix

I'd like a way to ensure that SpecialExtInp doesn't get truncated even though it's at the tail of the extra inputs. If truncations of extra inputs needed to happen then in my situation I would want the next ExtInp to be removed but not SpecialExtInp.

I'm curious if this type of situation is possible or requires changes to llama.cpp. It seems like what I want is actually something slightly different than the extra_inputs. a new field which looks like an extra_input in the LLM prompt but which is always fed in right before the prefix+suffix area and doesn't get truncated.

(thoughts @ggerganov ?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infill endpoint, allow for specialized extra_input to be added to the tail without risk of truncation. #21415

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Infill endpoint, allow for specialized extra_input to be added to the tail without risk of truncation. #21415

Uh oh!

bogzbonny Apr 4, 2026

(1) Normal prompt:

(2) Prompt with special-context

Replies: 0 comments

bogzbonny
Apr 4, 2026