fix(dify): strip <think> tags from Dify runner output#6619
fix(dify): strip <think> tags from Dify runner output#6619NayukiChiba wants to merge 2 commits intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在解决 Dify 代理运行器中一个问题,即某些底层模型即使在 Dify 侧禁用了思考模式,仍会在输出中包含 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
parse_dify_result, for the workflow branch whereoutputis a list, the non-file case inside thefor item in outputloop still doesstr(output)instead ofstr(item), which will duplicate the whole list instead of just the current element and likely isn't what you want. - Consider short-circuiting
_strip_think_tags(e.g.,if '<think' not in text and '</think>' not in text: return text.strip()) to avoid running two regexes on every chunk that doesn't contain these tags, especially in streaming mode where this method is called very frequently.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `parse_dify_result`, for the workflow branch where `output` is a list, the non-file case inside the `for item in output` loop still does `str(output)` instead of `str(item)`, which will duplicate the whole list instead of just the current element and likely isn't what you want.
- Consider short-circuiting `_strip_think_tags` (e.g., `if '<think' not in text and '</think>' not in text: return text.strip()`) to avoid running two regexes on every chunk that doesn't contain these tags, especially in streaming mode where this method is called very frequently.
## Individual Comments
### Comment 1
<location path="astrbot/core/agent/runners/dify/dify_agent_runner.py" line_range="196-195" />
<code_context>
- chain=MessageChain().message(chunk["answer"])
- ),
- )
+ delta = self._strip_think_tags(chunk["answer"])
+ if delta:
+ yield AgentResponse(
+ type="streaming_delta",
+ data=AgentResponseData(
</code_context>
<issue_to_address>
**issue (bug_risk):** Streaming stripping of <think> tags may leak partial chain-of-thought when tags span multiple chunks.
Because `_strip_think_tags` runs per chunk, a `<think>` block split across chunks won’t be fully removed until the closing tag appears in the same chunk. Earlier chunks can leak partial chain-of-thought, and the final chunk may contain only the tail of the visible answer. To avoid this, buffer while inside `<think>...</think>` and only emit deltas when outside those regions, instead of stripping per chunk.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL) | ||
| text = re.sub(r"</think>\s*$", "", text) |
There was a problem hiding this comment.
为了提高效率和代码简洁性,可以将这两个 re.sub 调用合并为一个。如果此函数被频繁调用(例如在流式处理的每个块中),在类级别预编译正则表达式(例如 _THINK_TAG_PATTERN = re.compile(...))可以进一步提升性能。
| text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL) | |
| text = re.sub(r"</think>\s*$", "", text) | |
| text = re.sub(r"<think>.*?</think>|</think>\s*$", "", text, flags=re.DOTALL) |
这个 PR 做了什么?
修复 #6437
部分底层模型(如 DeepSeek-R1)即使在 Dify 侧关闭了思考模式,仍会在输出中
夹带
<think>...</think>格式的思维链内容。由于 Dify Runner 没有对此进行过滤,这些标签会被原样透传给用户。
改动内容
dify_agent_runner.py中新增_strip_think_tags()静态方法,用于移除<think>...</think>块及多余的孤立</think>标签,与openai_source._parse_openai_completion中已有的处理逻辑保持一致。parse_dify_result()的所有输出路径中应用该过滤器(包括 chat/agent/chatflow 字符串结果,以及 workflow 的文本/列表/回退输出)。<think>内容在流式模式下被逐字打印给用户。Summary by Sourcery
Strip unintended chain-of-thought tags from Dify runner responses before delivering them to users.
Bug Fixes: