fix(agent,graph): handle streaming multi-generation assistant me… #3283
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes how assistant messages are selected in both non‑streaming and streaming paths when ChatResponse contains multiple Generation objects. It ensures that:
Background / Problem
Some providers return ChatResponse objects with multiple Generation entries for a single model invocation. At the same time:
Before this change:
AgentLlmNode in non‑streaming mode used:
response.getResult().getOutput()
which assumes the first generation is always “the real” assistant message.
Graph‑core streaming (NodeExecutor.getEmbedFlux and GraphFluxGenerator) also assumed:
This leads to two main issues:
When the first generation contains a plain assistant message and a later generation contains the tool call, the system will:
Usage‑only chunks (with null result/output) are not filtered out early, and downstream code blindly dereferences getResult().getOutput().
Screenshots / Debugging Evidence
The following debugger screenshots illustrate the problem and how it manifests at runtime:
Non‑streaming path in AgentLlmNode before the fix
At the breakpoint in AgentLlmNode.apply(...), the code was:
if (response != null && response.getResult() != null) {
responseMessage = response.getResult().getOutput();
}
The debugger shows that:
However, the code only uses getResult() (the first generation), so the tool call from the second generation is completely ignored.
A similar situation can arise in streaming: individual ChatResponse chunks may contain multiple generations, and usage‑only chunks may have null outputs. The previous streaming code accessed
response.getResult().getOutput() directly, which is fragile when:
What this PR changes
1. AgentLlmNode (agent framework)
File:
spring-ai-alibaba-agent-framework/src/main/java/com/alibaba/cloud/ai/graph/agent/node/AgentLlmNode.java
Add a helper method:
private AssistantMessage extractAssistantMessage(ChatResponse response)
Behavior:
new AssistantMessage("Empty response from model for unknown reason").
Non‑streaming path:
Streaming path (reasoning logs):
is not in the first generation.
2. NodeExecutor (graph core)
File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/executor/NodeExecutor.java
Add a helper:
private AssistantMessage extractAssistantMessage(ChatResponse response)
with the same preference rules as in AgentLlmNode, but returning null when no usable AssistantMessage exists (e.g. usage‑only chunks).
Update the filtering logic:
This naturally filters out usage‑only or non‑assistant chunks before mapping.
Update the mapping logic:
message.
Completion logic:
3. GraphFluxGenerator (graph core)
File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/streaming/GraphFluxGenerator.java
Add a helper:
private AssistantMessage extractAssistantMessage(ChatResponse response)
with the same “prefer tool calls, fallback to last assistant” semantics, returning null when no assistant message exists.
Update buildInternal(Flux flux):
The mergeMessage function now:
The chunk text function passed to GraphFlux.of is updated to:
response -> {
AssistantMessage message = extractAssistantMessage(response);
return message != null ? message.getText() : null;
}
This brings GraphFlux‑based streaming behavior in line with the agent/graph streaming behavior.
Rationale
ChatResponse can legitimately have multiple generations, and the first one is not guaranteed to be the one with tool calls. Choosing the right generation is critical for correct agent behavior.
Agents and tools should see the generation that actually contains tool calls. Prioritizing AssistantMessage.hasToolCalls() makes this explicit in both non‑streaming and streaming paths.
Usage‑only chunks and null outputs are a reality with some providers. Filtering these out early and avoiding raw getResult().getOutput() dereferences prevents NullPointerException and subtle bugs.
Non‑streaming and streaming code paths now share a very similar assistant‑message selection strategy, which makes behavior more predictable and easier to reason about.