Skip to content

Conversation

@zengchao304
Copy link

@zengchao304 zengchao304 commented Nov 28, 2025

Summary

This PR fixes how assistant messages are selected in both non‑streaming and streaming paths when ChatResponse contains multiple Generation objects. It ensures that:

  • Tool calls are not ignored when they appear in a non‑first generation.
  • Usage‑only streaming chunks (with null results/outputs) are skipped safely.
  • Agent‑level code (AgentLlmNode) and graph‑core streaming (NodeExecutor, GraphFluxGenerator) use a consistent assistant‑message selection strategy.

Background / Problem

Some providers return ChatResponse objects with multiple Generation entries for a single model invocation. At the same time:

  • ChatResponse#getResult() returns only the first generation.
  • Tool calls (AssistantMessage.hasToolCalls() == true) may be present in a later generation instead of the first one.
  • For streaming with usage reporting (e.g. streamUsage), providers may emit usage‑only chunks where:
    • ChatResponse.getResult() is null, or
    • Generation.getOutput() is null.

Before this change:

  • AgentLlmNode in non‑streaming mode used:

    response.getResult().getOutput()

    which assumes the first generation is always “the real” assistant message.

  • Graph‑core streaming (NodeExecutor.getEmbedFlux and GraphFluxGenerator) also assumed:

    • response.getResult() is non‑null;
    • response.getResult().getOutput() is the right message to stream and aggregate.

This leads to two main issues:

  1. Tool calls can be ignored.
    When the first generation contains a plain assistant message and a later generation contains the tool call, the system will:
    • Log or return the plain text;
    • Never expose the tool call to the agent/tooling layer.
  2. Streaming can see NullPointerException / inconsistent behavior.
    Usage‑only chunks (with null result/output) are not filtered out early, and downstream code blindly dereferences getResult().getOutput().

Screenshots / Debugging Evidence

The following debugger screenshots illustrate the problem and how it manifests at runtime:

  1. Non‑streaming path in AgentLlmNode before the fix
    At the breakpoint in AgentLlmNode.apply(...), the code was:

    if (response != null && response.getResult() != null) {
    responseMessage = response.getResult().getOutput();
    }

    The debugger shows that:

    • response.getResults() has two Generation entries.
    • The first generation’s AssistantMessage has no toolCalls.
    • The second generation’s AssistantMessage does contain tool calls.

    However, the code only uses getResult() (the first generation), so the tool call from the second generation is completely ignored.

image image
  1. Streaming behavior
    A similar situation can arise in streaming: individual ChatResponse chunks may contain multiple generations, and usage‑only chunks may have null outputs. The previous streaming code accessed
    response.getResult().getOutput() directly, which is fragile when:
    • The tool call is in a non‑first generation, or
    • The chunk only carries usage metadata.

What this PR changes

1. AgentLlmNode (agent framework)

File:
spring-ai-alibaba-agent-framework/src/main/java/com/alibaba/cloud/ai/graph/agent/node/AgentLlmNode.java

  • Add a helper method:

    private AssistantMessage extractAssistantMessage(ChatResponse response)

    Behavior:

    • Scans all Generation objects via response.getResults().
    • Prefers the first AssistantMessage that has tool calls.
    • If none have tool calls, falls back to the last non‑null AssistantMessage.
    • If getResults() fails or is empty, falls back to response.getResult().getOutput() if it is an AssistantMessage.
    • If nothing usable is found, returns a default
      new AssistantMessage("Empty response from model for unknown reason").
  • Non‑streaming path:

    • Replace direct use of response.getResult().getOutput() with extractAssistantMessage(response) when selecting the final model response.
  • Streaming path (reasoning logs):

    • Replace direct use of chatResponse.getResult().getOutput() with extractAssistantMessage(chatResponse) so streaming logs reflect the selected assistant message (including tool calls) even if it
      is not in the first generation.

2. NodeExecutor (graph core)

File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/executor/NodeExecutor.java

  • In getEmbedFlux(...), which processes embedded Flux:
    • Add a helper:

      private AssistantMessage extractAssistantMessage(ChatResponse response)

      with the same preference rules as in AgentLlmNode, but returning null when no usable AssistantMessage exists (e.g. usage‑only chunks).

    • Update the filtering logic:

      • Previously: keep all ChatResponse objects where response.getResult() != null.
      • Now: only keep ChatResponse objects where extractAssistantMessage(response) != null.
        This naturally filters out usage‑only or non‑assistant chunks before mapping.
    • Update the mapping logic:

      • Replace all direct response.getResult().getOutput() accesses with extractAssistantMessage(response).
      • Maintain lastChatResponseRef as a normalized ChatResponse whose single generation’s output is the selected AssistantMessage.
      • For each new chunk:
        • If there is no previous response:
          • Stream the current AssistantMessage and normalize lastChatResponseRef.
        • If the current assistant message has tool calls:
          • Stream that message and replace lastChatResponseRef with a normalized response containing that message.
        • Otherwise:
          • Concatenate lastMessage.text and currentMessage.text into a new AssistantMessage, update lastChatResponseRef with an aggregated ChatResponse, and stream only the current chunk
            message.
    • Completion logic:

      • Previously: used lastChatResponseRef.get().getResult().getOutput() directly.
      • Now: uses extractAssistantMessage(lastChatResponseRef.get()) as the final message and stores that in the completion result map (under the key and under messages when applicable).

3. GraphFluxGenerator (graph core)

File:
spring-ai-alibaba-graph-core/src/main/java/com/alibaba/cloud/ai/graph/streaming/GraphFluxGenerator.java

  • In GraphFluxGenerator.Builder:
    • Add a helper:

      private AssistantMessage extractAssistantMessage(ChatResponse response)

      with the same “prefer tool calls, fallback to last assistant” semantics, returning null when no assistant message exists.

    • Update buildInternal(Flux flux):

      • The mergeMessage function now:

        • Uses extractAssistantMessage(response) as currentMessage.
        • If currentMessage == null, treat the chunk as usage‑only and keep the previously aggregated ChatResponse.
        • If it is the first chunk, normalize the response into one generation with currentMessage.
        • If it is not the first chunk:
          • If currentMessage.hasToolCalls(), override the aggregated response with the current message.
          • Otherwise, concatenate lastMessage.text and currentMessage.text into a new AssistantMessage, build a new aggregated ChatResponse, and return it.
      • The chunk text function passed to GraphFlux.of is updated to:

        response -> {
        AssistantMessage message = extractAssistantMessage(response);
        return message != null ? message.getText() : null;
        }

This brings GraphFlux‑based streaming behavior in line with the agent/graph streaming behavior.

Rationale

  • Multi‑generation correctness:
    ChatResponse can legitimately have multiple generations, and the first one is not guaranteed to be the one with tool calls. Choosing the right generation is critical for correct agent behavior.
  • Tool‑call friendliness:
    Agents and tools should see the generation that actually contains tool calls. Prioritizing AssistantMessage.hasToolCalls() makes this explicit in both non‑streaming and streaming paths.
  • Streaming robustness:
    Usage‑only chunks and null outputs are a reality with some providers. Filtering these out early and avoiding raw getResult().getOutput() dereferences prevents NullPointerException and subtle bugs.
  • Consistency:
    Non‑streaming and streaming code paths now share a very similar assistant‑message selection strategy, which makes behavior more predictable and easier to reason about.

@CLAassistant
Copy link

CLAassistant commented Nov 28, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the area/graph SAA Grpah module label Nov 28, 2025
@zengchao304 zengchao304 changed the title fix(agent,graph-core): handle streaming multi-generation assistant me… fix(agent,graph): handle streaming multi-generation assistant me… Nov 28, 2025
@chickenlj
Copy link
Collaborator

Relate to #3138

@zengchao304 zengchao304 force-pushed the fix-multi-generation-toolcall branch from c5d5eb8 to fe4f225 Compare December 1, 2025 08:54
(cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)
…ssages

(cherry picked from commit 30b4f67dd53cc6685c0674cb6e7b58c54fb20a63)
- Update NodeExecutor.getEmbedFlux to correctly handle text -> tool_call -> text streaming:
    - Remove the assumption that the previous chunk always has non-null text
    - Always merge text and toolCalls instead of overwriting aggregated content on tool chunks
  - Add AssistantMessageUtils.mergeToolCalls to deduplicate and preserve tool calls by id
  - Align GraphFluxGenerator streaming aggregation with the same text + toolCalls merge strategy
…7 compatibility

  - Remove Java 21+ instanceof pattern matching in AgentLlmNode and use
    classic instanceof checks with explicit casts so the code compiles
    under JDK 17.
  - Slightly refactor the assistant message fallback extraction to avoid
    pattern binding while preserving behavior.
  - Clean up an unused static import (requireNonNull) in NodeExecutor.
@zengchao304 zengchao304 force-pushed the fix-multi-generation-toolcall branch from 663bfe2 to 9938c51 Compare December 1, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/graph SAA Grpah module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants