You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to figure out the best way to add more specialized-context to some of my FIM prompts.
This specialized-context could be a variety of information which would be directly helpful for this specific FIM request, but maybe not useful for other FIM requests in different locations.
I considered that this specialized-context could be added into the prefix, however then it risks being truncated due to the batch size... so that doesn't make much sense.
AFAICT the best way to feed in this information is through an extra_input field [call this "special_extra_input"], and I could add this information to the END of the request extra_inputs array (which as I understand means that it gets added to the very start of the llm prompt) - HOWEVER, and please correct me if I'm wrong, if for the next prompt I remove the "special_extra_input", it will effectively reset the caching slowing future prompts down.
What I really want is a way to add some extra content to the TAIL of the extra_content in the prompt so if I then remove it on the next prompt it doesn't effect the caching at all. I COULD simply add this extra prompt the beginning of the extra-input-array in the request information which would then put it at the end of the prompt like I want, however then I think I risk this information being truncated? (am I wrong about this??)
LLM text if truncation was required due to long context: ExtInp3 ExtInp2 SpecialExtInp prefix suffix
I'd like a way to ensure that SpecialExtInp doesn't get truncated even though it's at the tail of the extra inputs. If truncations of extra inputs needed to happen then in my situation I would want the next ExtInp to be removed but not SpecialExtInp.
I'm curious if this type of situation is possible or requires changes to llama.cpp. It seems like what I want is actually something slightly different than the extra_inputs. a new field which looks like an extra_input in the LLM prompt but which is always fed in right before the prefix+suffix area and doesn't get truncated.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm attempting to figure out the best way to add more specialized-context to some of my FIM prompts.
This specialized-context could be a variety of information which would be directly helpful for this specific FIM request, but maybe not useful for other FIM requests in different locations.
I considered that this specialized-context could be added into the prefix, however then it risks being truncated due to the batch size... so that doesn't make much sense.
AFAICT the best way to feed in this information is through an extra_input field [call this "special_extra_input"], and I could add this information to the END of the request extra_inputs array (which as I understand means that it gets added to the very start of the llm prompt) - HOWEVER, and please correct me if I'm wrong, if for the next prompt I remove the "special_extra_input", it will effectively reset the caching slowing future prompts down.
What I really want is a way to add some extra content to the TAIL of the extra_content in the prompt so if I then remove it on the next prompt it doesn't effect the caching at all. I COULD simply add this extra prompt the beginning of the extra-input-array in the request information which would then put it at the end of the prompt like I want, however then I think I risk this information being truncated? (am I wrong about this??)
(1) Normal prompt:
ExtInp1 ExtInp2 ExtInp3 prefix suffixExtInp3 ExtInp2 ExtInp1 prefix suffix(2) Prompt with special-context
SpecialExtInp ExtInp1 ExtInp2 ExtInp3 prefix suffixExtInp3 ExtInp2 ExtInp1 SpecialExtInp prefix suffixExtInp3 ExtInp2 SpecialExtInp prefix suffixI'd like a way to ensure that SpecialExtInp doesn't get truncated even though it's at the tail of the extra inputs. If truncations of extra inputs needed to happen then in my situation I would want the next ExtInp to be removed but not SpecialExtInp.
I'm curious if this type of situation is possible or requires changes to llama.cpp. It seems like what I want is actually something slightly different than the extra_inputs. a new field which looks like an extra_input in the LLM prompt but which is always fed in right before the prefix+suffix area and doesn't get truncated.
(thoughts @ggerganov ?)
Beta Was this translation helpful? Give feedback.
All reactions