Skip to content

fix: resolve CodeQL false-positive cleartext logging alert in PR #6660#6736

Open
rin259 wants to merge 5 commits intoAstrBotDevs:masterfrom
rin259:fix/codeql-cleartext-logging
Open

fix: resolve CodeQL false-positive cleartext logging alert in PR #6660#6736
rin259 wants to merge 5 commits intoAstrBotDevs:masterfrom
rin259:fix/codeql-cleartext-logging

Conversation

@rin259
Copy link
Contributor

@rin259 rin259 commented Mar 21, 2026

Summary

This PR fixes a CodeQL false-positive alert (pycleartext-logging-sensitive-data) in PR #6660.

Issue

In _run_compression(), the parameter named prev_tokens was flagged as logging sensitive data (secret) in cleartext — CodeQL's heuristic marks any variable with "token" in the name as potentially sensitive.

The value is a plain integer token count, not an authentication token.

Fix

Renamed prev_tokens to token_count_before throughout the function. This clears the heuristic-based alert while preserving identical functionality.

Files changed

  • astrbot/core/agent/context/manager.py: 3-line rename in parameter, docstring, and log format string.

Note: This is a follow-up fix for PR #6660. Once both are merged, the CodeQL check should pass.

Summary by Sourcery

Optimize context compression, token counting, and LLM-based summarization with caching and improved token estimation, while exposing basic stats and maintenance helpers.

New Features:

  • Expose statistics and reset helpers for the context manager and token counter to inspect compression and caching behavior.

Bug Fixes:

  • Avoid repeated token counting and redundant compression passes in context management to reduce unnecessary work and potential inconsistencies.

Enhancements:

  • Improve token estimation logic for mixed-language text and account for per-message overhead in token counting.
  • Add caching to token counting and LLM-based summary generation using stronger cache keys derived from full message history.
  • Refine context compression flow with incremental compression support, better logging, and integration of cached token counts.
  • Document behavior and optimizations in context manager, token counter, and compressor components with clearer docstrings and comments.

Tests:

  • Add comprehensive tests for token estimation, cache behavior, context splitting, LLM summary compression (including caching), and context manager processing paths.

rin and others added 5 commits March 20, 2026 11:13
- Token 估算算法改进(中英数特字符分别计算)
- 添加 Token 计数缓存和摘要缓存
- ContextManager 添加指纹机制减少重复计算
- 修复缓存键碰撞和 overhead 重复计算 bug
CodeQL flags 'Clear-text logging of sensitive information' on prev_tokens
in logger.info(), as the parameter name matches secret-detection heuristics.

The value is a plain token count (integer), not an auth token.
Rename to token_count_before to clear the alert.
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 21, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on optimizing the context management system within the astrbot agent. It introduces robust caching mechanisms for both token counting and LLM-based summary generation, aiming to reduce computational overhead and improve response times. Additionally, it addresses a specific CodeQL false-positive alert by clarifying a variable name, ensuring code quality checks pass correctly without altering functionality.

Highlights

  • CodeQL False-Positive Fix: Resolved a CodeQL false-positive alert (pycleartext-logging-sensitive-data) by renaming the prev_tokens parameter to token_count_before in _run_compression within astrbot/core/agent/context/manager.py. This variable was incorrectly flagged as sensitive data, but it represents an integer token count.
  • Token Counting Optimization: Implemented a caching mechanism in EstimateTokenCounter to avoid redundant token count calculations for unchanged message lists, significantly improving performance. This includes a new _get_cache_key method and cache hit/miss statistics.
  • LLM Summary Caching: Introduced caching for LLM-generated summaries in LLMSummaryCompressor to prevent re-generating summaries for identical conversation histories, enhancing efficiency.
  • Context Manager Enhancements: The ContextManager was updated to leverage the new token counting and summary caching, reducing duplicate work and improving overall context processing performance. New methods for retrieving and resetting statistics were also added.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:core The bug / feature is about astrbot's core, backend label Mar 21, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant optimizations to the context management system, including caching for token counting and LLM-based summarization, improved token estimation, and more robust context compression logic. The addition of comprehensive tests for these new features is also a great improvement.

However, the pull request description is misleading as it only mentions a minor variable rename to fix a CodeQL alert, while the actual changes are much more substantial. It would be beneficial to update the description to accurately reflect the scope of the work done.

I've identified a few areas for improvement:

  • There's a potential correctness issue in the summary cache key generation that could lead to collisions.
  • The context manager relies on a private method of the token counter, breaking encapsulation.
  • The caching eviction strategies in both the summary compressor and token counter could be improved for better performance and effectiveness.
  • There is inconsistent use of Chinese and English in docstrings and comments, which could affect maintainability.

Details are in the specific comments.

key_parts = []
for msg in messages:
content = msg.content if isinstance(msg.content, str) else str(msg.content)
key_parts.append(f"{msg.role}:{content[:50]}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current cache key generation truncates message content to 50 characters. This can lead to key collisions where different message histories produce the same key, causing incorrect cache hits. This is a correctness issue as it could lead to returning a summary for a different conversation. To create a truly collision-resistant key as stated in the docstring, consider using a hash of the full content of each message. You'll need to import hashlib.

Suggested change
key_parts.append(f"{msg.role}:{content[:50]}")
key_parts.append(f"{msg.role}:{hashlib.sha256(content.encode('utf-8')).hexdigest()}")

return 0

# 使用 token counter 的缓存键作为指纹
return self.token_counter._get_cache_key(messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This method calls a private method _get_cache_key on self.token_counter. This breaks encapsulation and creates a strong coupling with the internal implementation of EstimateTokenCounter. If a different implementation of TokenCounter is used (which is allowed by the configuration), this code will likely fail with an AttributeError.

To fix this, _get_cache_key should be made a public method (e.g., get_messages_fingerprint) and added to the TokenCounter protocol if it's intended to be part of its public API.

Comment on lines +169 to 173
Optimizations:
- 支持增量摘要,只摘要超出的部分
- 添加摘要缓存避免重复摘要
- 支持自定义摘要提示词
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstrings, comments, and test names in this PR are a mix of English and Chinese. While this might be acceptable for an internal team, it can make the code harder to maintain for a wider audience or future contributors who may not be fluent in both languages. For consistency and broader accessibility, it would be better to stick to a single language, preferably English, which seems to be the primary language for the rest of the project's code.

# 尝试从缓存获取摘要
summary_content = None
if cache_key in self._summary_cache:
summary_content = self._summary_cache[cache_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To implement a more effective LRU cache policy, you should move the accessed item to the end of the dictionary on a cache hit. This ensures frequently used items are not prematurely evicted.

Suggested change
summary_content = self._summary_cache[cache_key]
summary_content = self._summary_cache.pop(cache_key)
self._summary_cache[cache_key] = summary_content

Comment on lines +113 to +117
# 简单的缓存淘汰: 清空一半
keys_to_remove = list(self._cache.keys())[: self._cache_size // 2]
for key in keys_to_remove:
del self._cache[key]
self._cache[cache_key] = total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cache eviction policy of clearing half the cache when it becomes full is inefficient. This will lead to a sudden drop in performance whenever the cache is half-cleared, as many subsequent requests will result in cache misses. A more standard approach is to evict a single item at a time, for example, the first one added (FIFO). This provides more consistent performance.

            # Simple FIFO eviction: remove the oldest item.
            self._cache.pop(next(iter(self._cache)))
            self._cache[cache_key] = total

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The PR description/title suggests a small variable rename for a CodeQL false positive, but the diff introduces substantial behavioral changes (caching, new stats, new compressors, tests); consider splitting these optimizations into a separate PR to keep changes focused and easier to review/rollback.
  • ContextManager._get_messages_fingerprint currently uses self.token_counter._get_cache_key(messages) and tests access several underscored helpers (e.g., _get_cache_key, _estimate_tokens, _generate_summary_cache_key), which tightly couples behavior to internals; exposing small public helpers or interfaces for these concerns would make the design more stable.
  • The summary cache key in _generate_summary_cache_key truncates content to 50 characters and concatenates role/content strings, which increases the chance of collisions as histories grow; consider using a structured hash (e.g., hashing the full role/content/tool_calls tuple) to reduce collision risk while keeping keys bounded.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The PR description/title suggests a small variable rename for a CodeQL false positive, but the diff introduces substantial behavioral changes (caching, new stats, new compressors, tests); consider splitting these optimizations into a separate PR to keep changes focused and easier to review/rollback.
- ContextManager._get_messages_fingerprint currently uses self.token_counter._get_cache_key(messages) and tests access several underscored helpers (e.g., _get_cache_key, _estimate_tokens, _generate_summary_cache_key), which tightly couples behavior to internals; exposing small public helpers or interfaces for these concerns would make the design more stable.
- The summary cache key in _generate_summary_cache_key truncates content to 50 characters and concatenates role/content strings, which increases the chance of collisions as histories grow; consider using a structured hash (e.g., hashing the full role/content/tool_calls tuple) to reduce collision risk while keeping keys bounded.

## Individual Comments

### Comment 1
<location path="astrbot/core/agent/context/manager.py" line_range="60-61" />
<code_context>
+        if not messages:
+            return 0
+
+        # 使用 token counter 的缓存键作为指纹
+        return self.token_counter._get_cache_key(messages)
+
     async def process(
</code_context>
<issue_to_address>
**issue:** Using a private method on `token_counter` for fingerprints couples `ContextManager` to internal implementation details.

Using `self.token_counter._get_cache_key` hard-codes `ContextManager` to a specific `EstimateTokenCounter` implementation and its private API, so changes or swaps will likely break this. Consider either adding a public `get_fingerprint`/`get_cache_key` API on the token counter, or moving the fingerprint logic into `ContextManager` so it doesn’t depend on a private method.
</issue_to_address>

### Comment 2
<location path="astrbot/core/agent/context/manager.py" line_range="88-89" />
<code_context>
-                total_tokens = self.token_counter.count_tokens(
-                    result, trusted_token_usage
-                )
+                # 优化: 使用缓存的 token 计数或计算新值
+                current_fingerprint = self._get_messages_fingerprint(messages)
+
+                if trusted_token_usage > 0:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Fingerprint is computed from the original `messages` while token counting uses `result`, which may cause cache inconsistencies.

Because `_last_messages_fingerprint` is later set from `result` (after compression), future calls will compare against a fingerprint derived from different data than the one used for `total_tokens`. This inconsistency can weaken cache usefulness or cause unexpected cache hits/misses. Consider standardizing on a single source (e.g., always `result`) for both fingerprinting and token counting when updating and checking cache state.

```suggestion
                # 优化: 使用缓存的 token 计数或计算新值
                # 使用 result 作为指纹来源,以与 token 统计和缓存状态保持一致
                current_fingerprint = self._get_messages_fingerprint(result)
```
</issue_to_address>

### Comment 3
<location path="astrbot/core/agent/context/compressor.py" line_range="149-158" />
<code_context>
     return system_messages, messages_to_summarize, recent_messages


+def _generate_summary_cache_key(messages: list[Message]) -> str:
+    """Generate a cache key for summary based on full history.
+
+    Uses role and content from all messages to create a collision-resistant key.
+    """
+    if not messages:
+        return ""
+
+    key_parts = []
+    for msg in messages:
+        content = msg.content if isinstance(msg.content, str) else str(msg.content)
+        key_parts.append(f"{msg.role}:{content[:50]}")
+
+    return "|".join(key_parts)
</code_context>
<issue_to_address>
**issue (bug_risk):** Summary cache key truncates content to 50 chars and may not match the "full history" / collision-resistant intent.

The implementation only uses each message’s role plus the first 50 chars of `content`, which conflicts with the docstring’s “full history” and “collision-resistant” guarantees. Different histories with shared prefixes or complex/multimodal content can collide and reuse the wrong summary. Please either relax the docstring to reflect this heuristic, or update the key (e.g., hash of full content + relevant metadata like tool calls) to better match the described guarantees.
</issue_to_address>

### Comment 4
<location path="tests/test_context_compression.py" line_range="145-154" />
<code_context>
+class TestTruncateByTurnsCompressor:
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for the actual truncation behavior of TruncateByTurnsCompressor.__call__

Right now we only test `should_compress`, but not the actual truncation performed in `__call__` via `ContextTruncator.truncate_by_dropping_oldest_turns`. Please add at least one test that does `await compressor(messages)` and asserts that (a) the oldest turns are removed and (b) the resulting message count matches the expectation for a given `truncate_turns` value (e.g., `truncate_turns=1`).

Suggested implementation:

```python
    def setup_method(self):
        """Setup test fixtures."""
        self.compressor = TruncateByTurnsCompressor(truncate_turns=1)

    @pytest.mark.asyncio
    async def test_call_truncates_by_dropping_oldest_turns(self):
        """`__call__` 应按 truncate_turns 丢弃最老的对话轮次。"""
        # 3 个对话轮次,每轮为 (user, assistant)
        messages = [
            Message(role="user", content="u1"),
            Message(role="assistant", content="a1"),
            Message(role="user", content="u2"),
            Message(role="assistant", content="a2"),
            Message(role="user", content="u3"),
            Message(role="assistant", content="a3"),
        ]

        # truncate_turns=1 时,应只保留最新的一轮 (u3, a3)
        compressed_messages = await self.compressor(messages)

        assert len(compressed_messages) == 2
        assert [m.content for m in compressed_messages] == ["u3", "a3"]

    def test_should_compress_above_threshold(self):
        """测试超过阈值时触发压缩。"""
        messages = [
            Message(role="user", content="测试消息"),

```

1. Ensure `pytest` is imported in this test module (e.g., `import pytest`) if it is not already.
2. This test assumes that `truncate_turns` 的单位是“对话轮次(user + assistant)”,且 `TruncateByTurnsCompressor.__call__` 返回被 `ContextTruncator.truncate_by_dropping_oldest_turns` 处理后的消息列表;如果实现略有不同(例如包含 system 消息或定义“turn”不同),请相应调整预期的消息数和内容断言。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +60 to +61
# 使用 token counter 的缓存键作为指纹
return self.token_counter._get_cache_key(messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Using a private method on token_counter for fingerprints couples ContextManager to internal implementation details.

Using self.token_counter._get_cache_key hard-codes ContextManager to a specific EstimateTokenCounter implementation and its private API, so changes or swaps will likely break this. Consider either adding a public get_fingerprint/get_cache_key API on the token counter, or moving the fingerprint logic into ContextManager so it doesn’t depend on a private method.

Comment on lines +88 to +89
# 优化: 使用缓存的 token 计数或计算新值
current_fingerprint = self._get_messages_fingerprint(messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Fingerprint is computed from the original messages while token counting uses result, which may cause cache inconsistencies.

Because _last_messages_fingerprint is later set from result (after compression), future calls will compare against a fingerprint derived from different data than the one used for total_tokens. This inconsistency can weaken cache usefulness or cause unexpected cache hits/misses. Consider standardizing on a single source (e.g., always result) for both fingerprinting and token counting when updating and checking cache state.

Suggested change
# 优化: 使用缓存的 token 计数或计算新值
current_fingerprint = self._get_messages_fingerprint(messages)
# 优化: 使用缓存的 token 计数或计算新值
# 使用 result 作为指纹来源,以与 token 统计和缓存状态保持一致
current_fingerprint = self._get_messages_fingerprint(result)

Comment on lines +149 to +158
def _generate_summary_cache_key(messages: list[Message]) -> str:
"""Generate a cache key for summary based on full history.

Uses role and content from all messages to create a collision-resistant key.
"""
if not messages:
return ""

key_parts = []
for msg in messages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Summary cache key truncates content to 50 chars and may not match the "full history" / collision-resistant intent.

The implementation only uses each message’s role plus the first 50 chars of content, which conflicts with the docstring’s “full history” and “collision-resistant” guarantees. Different histories with shared prefixes or complex/multimodal content can collide and reuse the wrong summary. Please either relax the docstring to reflect this heuristic, or update the key (e.g., hash of full content + relevant metadata like tool calls) to better match the described guarantees.

Comment on lines +145 to +154
class TestTruncateByTurnsCompressor:
"""Test cases for truncate by turns compressor."""

def setup_method(self):
"""Setup test fixtures."""
self.compressor = TruncateByTurnsCompressor(truncate_turns=1)

def test_should_compress_above_threshold(self):
"""测试超过阈值时触发压缩。"""
messages = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for the actual truncation behavior of TruncateByTurnsCompressor.call

Right now we only test should_compress, but not the actual truncation performed in __call__ via ContextTruncator.truncate_by_dropping_oldest_turns. Please add at least one test that does await compressor(messages) and asserts that (a) the oldest turns are removed and (b) the resulting message count matches the expectation for a given truncate_turns value (e.g., truncate_turns=1).

Suggested implementation:

    def setup_method(self):
        """Setup test fixtures."""
        self.compressor = TruncateByTurnsCompressor(truncate_turns=1)

    @pytest.mark.asyncio
    async def test_call_truncates_by_dropping_oldest_turns(self):
        """`__call__` 应按 truncate_turns 丢弃最老的对话轮次。"""
        # 3 个对话轮次,每轮为 (user, assistant)
        messages = [
            Message(role="user", content="u1"),
            Message(role="assistant", content="a1"),
            Message(role="user", content="u2"),
            Message(role="assistant", content="a2"),
            Message(role="user", content="u3"),
            Message(role="assistant", content="a3"),
        ]

        # truncate_turns=1 时,应只保留最新的一轮 (u3, a3)
        compressed_messages = await self.compressor(messages)

        assert len(compressed_messages) == 2
        assert [m.content for m in compressed_messages] == ["u3", "a3"]

    def test_should_compress_above_threshold(self):
        """测试超过阈值时触发压缩。"""
        messages = [
            Message(role="user", content="测试消息"),
  1. Ensure pytest is imported in this test module (e.g., import pytest) if it is not already.
  2. This test assumes that truncate_turns 的单位是“对话轮次(user + assistant)”,且 TruncateByTurnsCompressor.__call__ 返回被 ContextTruncator.truncate_by_dropping_oldest_turns 处理后的消息列表;如果实现略有不同(例如包含 system 消息或定义“turn”不同),请相应调整预期的消息数和内容断言。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant