Description
The explicit cache control implementation in both DashScopeChatModel and OpenAIChatModel places cache_control at the message level, but the DashScope official documentation requires it to be placed inside the content block (within the content array). This applies to both the DashScope native protocol and the OpenAI-compatible protocol.
Current Behavior
When GenerateOptions.cacheControl(true) is enabled, both DashScopeChatFormatter.applyCacheControl() and OpenAIBaseFormatter.applyCacheControl() set cache_control on the message object directly (message level).
DashScope formatter:
public void applyCacheControl(List<DashScopeMessage> messages) {
for (DashScopeMessage msg : messages) {
if ("system".equals(msg.getRole()) && msg.getCacheControl() == null) {
msg.setCacheControl(EPHEMERAL_CACHE_CONTROL); // message level
}
}
DashScopeMessage lastMsg = messages.get(messages.size() - 1);
if (lastMsg.getCacheControl() == null) {
lastMsg.setCacheControl(EPHEMERAL_CACHE_CONTROL); // message level
}
}
OpenAI formatter has the same logic in OpenAIBaseFormatter.applyCacheControl().
This produces the following JSON for both protocols:
{
"role": "system",
"content": "You are a helpful assistant.",
"cache_control": {"type": "ephemeral"}
}
Expected Behavior
Per the official documentation, cache_control must be placed inside a content block, and content must be in array format:
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant.",
"cache_control": {"type": "ephemeral"}
}
]
}
The documentation states:
需将 content 字段改为数组形式,并添加 cache_control 字段
This format requirement applies to both OpenAI-compatible and DashScope native protocols when calling DashScope models.
Issues Identified
cache_control is placed at the wrong level — should be inside content blocks, not at message level. This affects both DashScopeChatFormatter and OpenAIBaseFormatter.
- Content part DTOs lack a
cache_control field — both DashScopeContentPart and OpenAIMessage content parts have no way to carry cache_control at the content block level.
- Multimodal messages are not handled — when content is already a list of content parts, the
cache_control still goes to message level and won't be recognized by the API.
- No guard for the 4-marker limit — the documentation states a maximum of 4
cache_control markers per request. If there are multiple system messages (e.g., injected by SkillHook or LongTermMemoryHook), the limit may be exceeded silently.
Suggested Fix
DashScope native protocol
- Add a
cache_control field (Map<String, String>) to DashScopeContentPart.
- Modify
DashScopeChatFormatter.applyCacheControl() to:
- Convert string
content to array format (List<DashScopeContentPart>) for target messages.
- Set
cache_control on the last content block within each target message.
- Apply the same fix to
DashScopeMultiAgentFormatter.applyCacheControl().
OpenAI-compatible protocol
- Modify
OpenAIBaseFormatter.applyCacheControl() with the same content-block-level approach.
- Add
cache_control support to OpenAI content part DTOs.
Common
- Add a guard to ensure no more than 4
cache_control markers per request.
- Keep the existing message-level
cacheControl fields for backward compatibility (manual metadata marking).
Affected Classes
DashScopeChatFormatter
DashScopeMultiAgentFormatter
DashScopeContentPart
DashScopeMessage
OpenAIBaseFormatter
OpenAIMessage
Discussion: Should applyCacheControl() auto-mark messages?
The current applyCacheControl() strategy is: "all system messages + last message". Given DashScope's prefix-matching caching mechanism, the strategy pattern itself is sound in theory:
- Marking system messages creates layered prefix cache blocks (A, AB, ABC…). Even if later messages change, the shorter prefix (e.g., just the stable system prompt) can still be hit — a reasonable tiered caching approach.
- Marking the last message caches the entire messages array as a complete prefix, which aligns with the official "continuous multi-turn dialog" pattern.
However, there are two practical concerns:
1. The 4-marker limit
The API enforces a hard limit of 4 cache_control markers per request. If more than 4 markers are present, only the last 4 take effect. In AgentScope, multiple hooks can dynamically inject system messages (e.g., SkillHook, LongTermMemoryHook, RAGHook), making the number of system messages unpredictable at the formatter level. When the total marker count exceeds 4, the earliest system messages — which are typically the most stable and most valuable to cache — will lose their markers and fall out of the cache.
2. Dynamic content defeats prefix caching
The framework cannot distinguish between stable system messages (e.g., the user's own system prompt) and dynamic ones (e.g., RAG-retrieved knowledge, long-term memory summaries). In AgentScope's hook architecture, hooks like GenericRAGHook and StaticLongTermMemoryHook inject system messages whose content changes on every request. Marking these with cache_control means each request creates a new cache block (at 125% of standard input cost) that will likely never be hit — the prefix changes every time.
Only the user knows which parts of their messages are stable and worth caching. A blanket "mark all system messages" strategy applied at the formatter level cannot make this distinction.
Suggestion
Consider making cache control user-driven rather than automatic:
- The existing
MessageMetadataKeys.CACHE_CONTROL mechanism already allows users to mark individual Msg objects for caching via metadata, which flows through applyCacheControlFromMetadata().
- The automatic
applyCacheControl() strategy could be removed or made opt-in, letting users who understand their caching needs and cost tolerance decide which messages to mark.
- If keeping an automatic strategy, add a guard to enforce the 4-marker limit and prioritize stable prefixes (first system message + last message).
References
Description
The explicit cache control implementation in both
DashScopeChatModelandOpenAIChatModelplacescache_controlat the message level, but the DashScope official documentation requires it to be placed inside the content block (within thecontentarray). This applies to both the DashScope native protocol and the OpenAI-compatible protocol.Current Behavior
When
GenerateOptions.cacheControl(true)is enabled, bothDashScopeChatFormatter.applyCacheControl()andOpenAIBaseFormatter.applyCacheControl()setcache_controlon the message object directly (message level).DashScope formatter:
OpenAI formatter has the same logic in
OpenAIBaseFormatter.applyCacheControl().This produces the following JSON for both protocols:
{ "role": "system", "content": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"} }Expected Behavior
Per the official documentation,
cache_controlmust be placed inside a content block, andcontentmust be in array format:{ "role": "system", "content": [ { "type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"} } ] }The documentation states:
This format requirement applies to both OpenAI-compatible and DashScope native protocols when calling DashScope models.
Issues Identified
cache_controlis placed at the wrong level — should be inside content blocks, not at message level. This affects bothDashScopeChatFormatterandOpenAIBaseFormatter.cache_controlfield — bothDashScopeContentPartandOpenAIMessagecontent parts have no way to carrycache_controlat the content block level.cache_controlstill goes to message level and won't be recognized by the API.cache_controlmarkers per request. If there are multiple system messages (e.g., injected by SkillHook or LongTermMemoryHook), the limit may be exceeded silently.Suggested Fix
DashScope native protocol
cache_controlfield (Map<String, String>) toDashScopeContentPart.DashScopeChatFormatter.applyCacheControl()to:contentto array format (List<DashScopeContentPart>) for target messages.cache_controlon the last content block within each target message.DashScopeMultiAgentFormatter.applyCacheControl().OpenAI-compatible protocol
OpenAIBaseFormatter.applyCacheControl()with the same content-block-level approach.cache_controlsupport to OpenAI content part DTOs.Common
cache_controlmarkers per request.cacheControlfields for backward compatibility (manual metadata marking).Affected Classes
DashScopeChatFormatterDashScopeMultiAgentFormatterDashScopeContentPartDashScopeMessageOpenAIBaseFormatterOpenAIMessageDiscussion: Should
applyCacheControl()auto-mark messages?The current
applyCacheControl()strategy is: "all system messages + last message". Given DashScope's prefix-matching caching mechanism, the strategy pattern itself is sound in theory:However, there are two practical concerns:
1. The 4-marker limit
The API enforces a hard limit of 4
cache_controlmarkers per request. If more than 4 markers are present, only the last 4 take effect. In AgentScope, multiple hooks can dynamically inject system messages (e.g., SkillHook, LongTermMemoryHook, RAGHook), making the number of system messages unpredictable at the formatter level. When the total marker count exceeds 4, the earliest system messages — which are typically the most stable and most valuable to cache — will lose their markers and fall out of the cache.2. Dynamic content defeats prefix caching
The framework cannot distinguish between stable system messages (e.g., the user's own system prompt) and dynamic ones (e.g., RAG-retrieved knowledge, long-term memory summaries). In AgentScope's hook architecture, hooks like
GenericRAGHookandStaticLongTermMemoryHookinject system messages whose content changes on every request. Marking these withcache_controlmeans each request creates a new cache block (at 125% of standard input cost) that will likely never be hit — the prefix changes every time.Only the user knows which parts of their messages are stable and worth caching. A blanket "mark all system messages" strategy applied at the formatter level cannot make this distinction.
Suggestion
Consider making cache control user-driven rather than automatic:
MessageMetadataKeys.CACHE_CONTROLmechanism already allows users to mark individualMsgobjects for caching via metadata, which flows throughapplyCacheControlFromMetadata().applyCacheControl()strategy could be removed or made opt-in, letting users who understand their caching needs and cost tolerance decide which messages to mark.References
MessageContentText.builder().text(...).cacheControl(CacheControl.builder().type("ephemeral").build()).build()