fix(agent,graph): handle streaming multi-generation assistant me… #3283

zengchao304 · 2025-11-28T15:57:35Z

Summary

This PR fixes how assistant messages are selected in both non‑streaming and streaming paths when ChatResponse contains multiple Generation objects. It ensures that:

Tool calls are not ignored when they appear in a non‑first generation.
Usage‑only streaming chunks (with null results/outputs) are skipped safely.
Agent‑level code (AgentLlmNode) and graph‑core streaming (NodeExecutor, GraphFluxGenerator) use a consistent assistant‑message selection strategy.

Background / Problem

Some providers return ChatResponse objects with multiple Generation entries for a single model invocation. At the same time:

ChatResponse#getResult() returns only the first generation.
Tool calls (AssistantMessage.hasToolCalls() == true) may be present in a later generation instead of the first one.
For streaming with usage reporting (e.g. streamUsage), providers may emit usage‑only chunks where:
- ChatResponse.getResult() is null, or
- Generation.getOutput() is null.

Before this change:

AgentLlmNode in non‑streaming mode used:

response.getResult().getOutput()

which assumes the first generation is always “the real” assistant message.
Graph‑core streaming (NodeExecutor.getEmbedFlux and GraphFluxGenerator) also assumed:
- response.getResult() is non‑null;
- response.getResult().getOutput() is the right message to stream and aggregate.

This leads to two main issues:

Tool calls can be ignored.
When the first generation contains a plain assistant message and a later generation contains the tool call, the system will:
- Log or return the plain text;
- Never expose the tool call to the agent/tooling layer.
Streaming can see NullPointerException / inconsistent behavior.
Usage‑only chunks (with null result/output) are not filtered out early, and downstream code blindly dereferences getResult().getOutput().

Screenshots / Debugging Evidence

The following debugger screenshots illustrate the problem and how it manifests at runtime:

Non‑streaming path in AgentLlmNode before the fix
At the breakpoint in AgentLlmNode.apply(...), the code was:

if (response != null && response.getResult() != null) {
responseMessage = response.getResult().getOutput();
}

The debugger shows that:
- response.getResults() has two Generation entries.
- The first generation’s AssistantMessage has no toolCalls.
- The second generation’s AssistantMessage does contain tool calls.
However, the code only uses getResult() (the first generation), so the tool call from the second generation is completely ignored.

Streaming behavior
A similar situation can arise in streaming: individual ChatResponse chunks may contain multiple generations, and usage‑only chunks may have null outputs. The previous streaming code accessed
response.getResult().getOutput() directly, which is fragile when:
- The tool call is in a non‑first generation, or
- The chunk only carries usage metadata.

What this PR changes

1. AgentLlmNode (agent framework)

File:
spring-ai-alibaba-agent-framework/src/main/java/com/alibaba/cloud/ai/graph/agent/node/AgentLlmNode.java

Add a helper method:

private AssistantMessage extractAssistantMessage(ChatResponse response)

Behavior:
- Scans all Generation objects via response.getResults().
- Prefers the first AssistantMessage that has tool calls.
- If none have tool calls, falls back to the last non‑null AssistantMessage.
- If getResults() fails or is empty, falls back to response.getResult().getOutput() if it is an AssistantMessage.
- If nothing usable is found, returns a default
  new AssistantMessage("Empty response from model for unknown reason").
Non‑streaming path:
- Replace direct use of response.getResult().getOutput() with extractAssistantMessage(response) when selecting the final model response.
Streaming path (reasoning logs):
- Replace direct use of chatResponse.getResult().getOutput() with extractAssistantMessage(chatResponse) so streaming logs reflect the selected assistant message (including tool calls) even if it
  is not in the first generation.

2. NodeExecutor (graph core)

File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/executor/NodeExecutor.java

In getEmbedFlux(...), which processes embedded Flux:
- Add a helper:
  
  private AssistantMessage extractAssistantMessage(ChatResponse response)
  
  with the same preference rules as in AgentLlmNode, but returning null when no usable AssistantMessage exists (e.g. usage‑only chunks).
- Update the filtering logic:
  - Previously: keep all ChatResponse objects where response.getResult() != null.
  - Now: only keep ChatResponse objects where extractAssistantMessage(response) != null.
    This naturally filters out usage‑only or non‑assistant chunks before mapping.
- Update the mapping logic:
  - Replace all direct response.getResult().getOutput() accesses with extractAssistantMessage(response).
  - Maintain lastChatResponseRef as a normalized ChatResponse whose single generation’s output is the selected AssistantMessage.
  - For each new chunk:
    - If there is no previous response:
      - Stream the current AssistantMessage and normalize lastChatResponseRef.
    - If the current assistant message has tool calls:
      - Stream that message and replace lastChatResponseRef with a normalized response containing that message.
    - Otherwise:
      - Concatenate lastMessage.text and currentMessage.text into a new AssistantMessage, update lastChatResponseRef with an aggregated ChatResponse, and stream only the current chunk
        message.
- Completion logic:
  - Previously: used lastChatResponseRef.get().getResult().getOutput() directly.
  - Now: uses extractAssistantMessage(lastChatResponseRef.get()) as the final message and stores that in the completion result map (under the key and under messages when applicable).

3. GraphFluxGenerator (graph core)

File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/streaming/GraphFluxGenerator.java

In GraphFluxGenerator.Builder:
- Add a helper:
  
  private AssistantMessage extractAssistantMessage(ChatResponse response)
  
  with the same “prefer tool calls, fallback to last assistant” semantics, returning null when no assistant message exists.
- Update buildInternal(Flux flux):
  - The mergeMessage function now:
    - Uses extractAssistantMessage(response) as currentMessage.
    - If currentMessage == null, treat the chunk as usage‑only and keep the previously aggregated ChatResponse.
    - If it is the first chunk, normalize the response into one generation with currentMessage.
    - If it is not the first chunk:
      - If currentMessage.hasToolCalls(), override the aggregated response with the current message.
      - Otherwise, concatenate lastMessage.text and currentMessage.text into a new AssistantMessage, build a new aggregated ChatResponse, and return it.
  - The chunk text function passed to GraphFlux.of is updated to:
    
    response -> {
    AssistantMessage message = extractAssistantMessage(response);
    return message != null ? message.getText() : null;
    }

This brings GraphFlux‑based streaming behavior in line with the agent/graph streaming behavior.

Rationale

Multi‑generation correctness:
ChatResponse can legitimately have multiple generations, and the first one is not guaranteed to be the one with tool calls. Choosing the right generation is critical for correct agent behavior.
Tool‑call friendliness:
Agents and tools should see the generation that actually contains tool calls. Prioritizing AssistantMessage.hasToolCalls() makes this explicit in both non‑streaming and streaming paths.
Streaming robustness:
Usage‑only chunks and null outputs are a reality with some providers. Filtering these out early and avoiding raw getResult().getOutput() dereferences prevents NullPointerException and subtle bugs.
Consistency:
Non‑streaming and streaming code paths now share a very similar assistant‑message selection strategy, which makes behavior more predictable and easier to reason about.

CLAassistant · 2025-11-28T15:57:42Z

All committers have signed the CLA.

chickenlj · 2025-11-30T08:56:33Z

Relate to #3138

(cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)

…ssages (cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)

- Update NodeExecutor.getEmbedFlux to correctly handle text -> tool_call -> text streaming: - Remove the assumption that the previous chunk always has non-null text - Always merge text and toolCalls instead of overwriting aggregated content on tool chunks - Add AssistantMessageUtils.mergeToolCalls to deduplicate and preserve tool calls by id - Align GraphFluxGenerator streaming aggregation with the same text + toolCalls merge strategy

…7 compatibility - Remove Java 21+ instanceof pattern matching in AgentLlmNode and use classic instanceof checks with explicit casts so the code compiles under JDK 17. - Slightly refactor the assistant message fallback extraction to avoid pattern binding while preserving behavior. - Clean up an unused static import (requireNonNull) in NodeExecutor.

github-actions bot added the area/graph SAA Grpah module label Nov 28, 2025

zengchao304 changed the title ~~fix(agent,graph-core): handle streaming multi-generation assistant me…~~ fix(agent,graph): handle streaming multi-generation assistant me… Nov 28, 2025

zengchao304 force-pushed the fix-multi-generation-toolcall branch from c5d5eb8 to fe4f225 Compare December 1, 2025 08:54

zengchao304 added 7 commits December 1, 2025 23:49

fix(agent,graph): handle streaming multi-generation assistant messages

9a049cd

(cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)

fix(agent,graph-core): handle streaming multi-generation assistant me…

9f007be

…ssages (cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)

fix(graph): Fix embedded flux streaming and AssistantMessage extraction

eb5217f

fix(graph): Fix embedded Flux multi-chunk tool call handling

47e8ece

fix(graph): restore embedded flux interruption handling

5e71b5b

zengchao304 force-pushed the fix-multi-generation-toolcall branch from 663bfe2 to 9938c51 Compare December 1, 2025 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent,graph): handle streaming multi-generation assistant me… #3283

fix(agent,graph): handle streaming multi-generation assistant me… #3283

Uh oh!

zengchao304 commented Nov 28, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 28, 2025 •

edited

Loading

Uh oh!

chickenlj commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(agent,graph): handle streaming multi-generation assistant me… #3283

Are you sure you want to change the base?

fix(agent,graph): handle streaming multi-generation assistant me… #3283

Uh oh!

Conversation

zengchao304 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background / Problem

Screenshots / Debugging Evidence

What this PR changes

1. AgentLlmNode (agent framework)

2. NodeExecutor (graph core)

3. GraphFluxGenerator (graph core)

Rationale

Uh oh!

CLAassistant commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chickenlj commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zengchao304 commented Nov 28, 2025 •

edited

Loading

CLAassistant commented Nov 28, 2025 •

edited

Loading