fix(dify): strip <think> tags from Dify runner output by NayukiChiba · Pull Request #6619 · AstrBotDevs/AstrBot

NayukiChiba · 2026-03-19T11:05:47Z

这个 PR 做了什么？

修复 #6437

部分底层模型（如 DeepSeek-R1）即使在 Dify 侧关闭了思考模式，仍会在输出中
夹带 <think>...</think> 格式的思维链内容。由于 Dify Runner 没有对此进行过滤，
这些标签会被原样透传给用户。

改动内容

在dify_agent_runner.py中新增 _strip_think_tags()静态方法，用于移除<think>...</think> 块及多余的孤立 </think> 标签，与openai_source._parse_openai_completion 中已有的处理逻辑保持一致。
在parse_dify_result()的所有输出路径中应用该过滤器（包括 chat/agent/chatflow 字符串结果，以及 workflow 的文本/列表/回退输出）。
同样对流式输出的增量 chunk 进行过滤，避免 <think> 内容在流式模式下被逐字打印给用户。

Summary by Sourcery

Strip unintended chain-of-thought tags from Dify runner responses before delivering them to users.

Bug Fixes:

Filter out ... chain-of-thought blocks and trailing tags from non-streaming Dify chat and workflow outputs.
Sanitize streaming delta chunks from Dify to avoid emitting -wrapped content incrementally to clients.

gemini-code-assist · 2026-03-19T11:06:03Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决 Dify 代理运行器中一个问题，即某些底层模型即使在 Dify 侧禁用了思考模式，仍会在输出中包含 <think>...</think> 标签。通过引入一个通用的清理方法并将其应用于所有输出路径，确保了提供给用户的响应是干净的，不含这些内部思考标签，从而提升了用户体验并保持了输出的一致性。

Highlights

新增标签清理方法: 在 dify_agent_runner.py 中新增了一个静态方法 _strip_think_tags()，用于移除模型输出中意外出现的 <think>...</think> 块及孤立的 </think> 标签。
全面应用过滤器: 该标签清理方法已应用于 parse_dify_result() 的所有输出路径，包括聊天、代理、聊天流的字符串结果，以及工作流的文本、列表和回退输出。
流式输出过滤: 对流式输出的增量数据块也进行了过滤，以确保在流式模式下用户不会看到 <think> 内容。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In parse_dify_result, for the workflow branch where output is a list, the non-file case inside the for item in output loop still does str(output) instead of str(item), which will duplicate the whole list instead of just the current element and likely isn't what you want.
Consider short-circuiting _strip_think_tags (e.g., if '<think' not in text and '</think>' not in text: return text.strip()) to avoid running two regexes on every chunk that doesn't contain these tags, especially in streaming mode where this method is called very frequently.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `parse_dify_result`, for the workflow branch where `output` is a list, the non-file case inside the `for item in output` loop still does `str(output)` instead of `str(item)`, which will duplicate the whole list instead of just the current element and likely isn't what you want.
- Consider short-circuiting `_strip_think_tags` (e.g., `if '<think' not in text and '</think>' not in text: return text.strip()`) to avoid running two regexes on every chunk that doesn't contain these tags, especially in streaming mode where this method is called very frequently.

## Individual Comments

### Comment 1
<location path="astrbot/core/agent/runners/dify/dify_agent_runner.py" line_range="196-195" />
<code_context>
-                                    chain=MessageChain().message(chunk["answer"])
-                                ),
-                            )
+                            delta = self._strip_think_tags(chunk["answer"])
+                            if delta:
+                                yield AgentResponse(
+                                    type="streaming_delta",
+                                    data=AgentResponseData(
</code_context>
<issue_to_address>
**issue (bug_risk):** Streaming stripping of <think> tags may leak partial chain-of-thought when tags span multiple chunks.

Because `_strip_think_tags` runs per chunk, a `<think>` block split across chunks won’t be fully removed until the closing tag appears in the same chunk. Earlier chunks can leak partial chain-of-thought, and the final chunk may contain only the tail of the visible answer. To avoid this, buffer while inside `<think>...</think>` and only emit deltas when outside those regions, instead of stripping per chunk.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

astrbot/core/agent/runners/dify/dify_agent_runner.py

dosubot · 2026-03-19T11:07:21Z

Related Documentation

1 document(s) may need updating based on files changed in this PR:

AstrBotTeam's Space

pr4697的改动

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

这个 PR 旨在解决底层模型输出的 <think> 标签会泄露给用户的问题。改动引入了 _strip_think_tags 方法来过滤这些标签，并正确地将其应用于流式和非流式输出的各个路径。实现方式直接有效。我有一个关于优化正则表达式操作的建议，以提高性能和代码简洁性。

gemini-code-assist · 2026-03-19T11:07:36Z

astrbot/core/agent/runners/dify/dify_agent_runner.py

+        text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
+        text = re.sub(r"</think>\s*$", "", text)


为了提高效率和代码简洁性，可以将这两个 re.sub 调用合并为一个。如果此函数被频繁调用（例如在流式处理的每个块中），在类级别预编译正则表达式（例如 _THINK_TAG_PATTERN = re.compile(...)）可以进一步提升性能。

Suggested change

text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)

text = re.sub(r"</think>\s*$", "", text)

text = re.sub(r"<think>.*?</think>|</think>\s*$", "", text, flags=re.DOTALL)

fix(dify): strip <think> tags from Dify runner output

d616ebd

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 19, 2026

NayukiChiba marked this pull request as draft March 19, 2026 11:06

dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 19, 2026

sourcery-ai bot reviewed Mar 19, 2026

View reviewed changes

astrbot/core/agent/runners/dify/dify_agent_runner.py Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

feat: 添加一个dify的测试文件

5196cfa

NayukiChiba marked this pull request as ready for review March 19, 2026 11:13

auto-assign bot requested review from LIghtJUNction and advent259141 March 19, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(dify): strip <think> tags from Dify runner output#6619

fix(dify): strip <think> tags from Dify runner output#6619
NayukiChiba wants to merge 2 commits intoAstrBotDevs:masterfrom
NayukiChiba:fix/dify-think-tag-filter

NayukiChiba commented Mar 19, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

dosubot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
		text = re.sub(r"</think>\s*$", "", text)

	text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
	text = re.sub(r"</think>\s*$", "", text)
	text = re.sub(r"<think>.?</think>\|</think>\s$", "", text, flags=re.DOTALL)

Uh oh!

Conversation

NayukiChiba commented Mar 19, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

这个 PR 做了什么？

改动内容

Summary by Sourcery

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dosubot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pr4697的改动

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NayukiChiba commented Mar 19, 2026 •

edited by sourcery-ai bot

Loading

dosubot bot commented Mar 19, 2026 •

edited

Loading