Skip to content

feat(provider): add GLM-ASR and GLM-TTS providers#6603

Open
xwsjjctz wants to merge 4 commits intoAstrBotDevs:masterfrom
xwsjjctz:feat/add-glm-asr-and-tts
Open

feat(provider): add GLM-ASR and GLM-TTS providers#6603
xwsjjctz wants to merge 4 commits intoAstrBotDevs:masterfrom
xwsjjctz:feat/add-glm-asr-and-tts

Conversation

@xwsjjctz
Copy link
Contributor

@xwsjjctz xwsjjctz commented Mar 19, 2026

添加了bigmodel的语音转文本和文本转语音服务。

Modifications / 改动点

新增以下两个文件:
astrbot/core/provider/sources/glm_asr_source.py
astrbot/core/provider/sources/glm_tts_source.py

astrbot/core/provider/manager.py - 添加动态导入
astrbot/core/config/default.py - 添加默认配置
dashboard/src/composables/useProviderSources.ts - 添加类型映射
dashboard/src/i18n/locales/en-US/features/config-metadata.json - i18n
dashboard/src/i18n/locales/zh-CN/features/config-metadata.json - i18n
astrbot/core/message/components.py - 添加了url字段,QQ适配器调用 Record.fromURL(url) 这里没有,STT处理会被跳过

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

测试结果:
PixPin_2026-03-19_13-34-41

日志:

[13:32:48.657] [Core] [INFO] [core.event_bus:66]: [default] [azi(qq_official)] FA6AA95C7A5791BE887514B73CBAE9EB: [At:qq_official]  [ComponentType.Record]
[13:32:48.765] [Core] [INFO] [sources.glm_asr_source:76]: Converting silk file to wav for GLM-ASR...
[13:32:49.737] [Core] [INFO] [preprocess_stage.stage:86]: 语音转文本结果: 连接测试
[13:32:52.003] [Core] [INFO] [respond.stage:184]: Prepare to send - /FA6AA95C7A5791BE887514B73CBAE9EB: [引用消息] ✅
[13:33:01.716] [Core] [INFO] [core.event_bus:66]: [default] [azi(qq_official)] FA6AA95C7A5791BE887514B73CBAE9EB: [At:qq_official]  [ComponentType.Record]
[13:33:01.820] [Core] [INFO] [sources.glm_asr_source:76]: Converting silk file to wav for GLM-ASR...
[13:33:02.636] [Core] [INFO] [preprocess_stage.stage:86]: 语音转文本结果: 你还是说句话吧
[13:33:05.238] [Core] [INFO] [result_decorate.stage:286]: TTS 请求: 好的好的!本梓收到连接测试啦!🎮

有什么需要帮忙的吗?
[13:33:07.418] [Core] [INFO] [result_decorate.stage:288]: TTS 结果: /Users/huangxizhi/workspace/AstrBot/data/temp/glm_tts_85e0542f-ad5c-4822-9f05-2b3b62b53058.wav
[13:33:07.419] [Core] [INFO] [respond.stage:184]: Prepare to send - /FA6AA95C7A5791BE887514B73CBAE9EB: [ComponentType.Record]

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add BigModel GLM-based speech-to-text and text-to-speech providers and wire them into core configuration and the dashboard, along with minor message component adjustments for audio URLs.

New Features:

  • Introduce GLM-ASR speech-to-text provider backed by BigModel audio transcription API.
  • Introduce GLM-TTS text-to-speech provider backed by BigModel audio speech API.

Enhancements:

  • Extend default provider configuration and dashboard provider-type mapping to support the new GLM-ASR and GLM-TTS providers.
  • Augment the Record message component to retain a URL field so audio messages from different sources can be correctly processed by STT flows.

Documentation:

  • Add i18n configuration metadata entries for the new GLM-ASR and GLM-TTS providers in both English and Chinese locales.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 19, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the bot's voice interaction features by integrating new Speech-to-Text (ASR) and Text-to-Speech (TTS) capabilities powered by GLM models. It introduces the necessary backend logic for processing audio inputs and generating speech outputs, along with corresponding configuration and UI updates to support these new providers. The changes aim to provide more robust and flexible voice interaction options for users.

Highlights

  • New Provider Integrations: Integrated GLM-ASR for Speech-to-Text and GLM-TTS for Text-to-Speech, expanding the bot's voice interaction capabilities.
  • Configuration Updates: Added default configurations for the new GLM-ASR and GLM-TTS providers in the system's default settings.
  • Message Component Enhancement: Modified the Record message component to consistently include a url field, improving compatibility for Speech-to-Text processing, especially for QQ adapter calls.
  • Dynamic Provider Loading: Updated the provider manager to dynamically import the new GLM-ASR and GLM-TTS modules.
  • Dashboard UI and Localization: Updated the dashboard's provider source mapping and added internationalization (i18n) entries for GLM-TTS configuration options in both English and Chinese.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 19, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In ProviderGLMASR.get_text, audio_url is reused for both the downloaded .input file and the converted .wav, so the finally block never deletes the downloaded temp file when a conversion occurs; keep the original download path in a separate variable so both temporary files can be cleaned up correctly.
  • Both ProviderGLMASR and ProviderGLMTTS proceed even if api_key is empty; consider validating configuration in __init__ (or before the first request) and raising a clear error when required fields like api_key are missing to avoid confusing runtime failures.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ProviderGLMASR.get_text`, `audio_url` is reused for both the downloaded `.input` file and the converted `.wav`, so the `finally` block never deletes the downloaded temp file when a conversion occurs; keep the original download path in a separate variable so both temporary files can be cleaned up correctly.
- Both `ProviderGLMASR` and `ProviderGLMTTS` proceed even if `api_key` is empty; consider validating configuration in `__init__` (or before the first request) and raising a clear error when required fields like `api_key` are missing to avoid confusing runtime failures.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 添加了对智谱(BigModel)的 GLM-ASR (语音转文本) 和 GLM-TTS (文本转语音) 服务的支持,包括核心 provider 实现、默认配置、动态导入和前端 UI 的适配。整体实现很完整。

我发现了一些可以改进的地方:

  1. glm_asr_source.py 中,当处理来自远程 URL 的音频文件时,存在一个潜在的资源泄漏问题。如果下载的临时文件需要被转换格式,原始的下载文件在处理结束后不会被删除。
  2. glm_tts_source.py 中,对于从配置中获取的 speedvolume 参数,缺少范围校验,这可能会导致向 API 发送无效值。

我已经针对这些问题提出了具体的代码修改建议。修复这些问题可以提高代码的健壮性和资源的利用效率。

Comment on lines +53 to +126
async def get_text(self, audio_url: str) -> str:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}

output_path = None

if audio_url.startswith("http"):
temp_dir = get_astrbot_temp_path()
local_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.input")
await download_file(audio_url, local_path)
audio_url = local_path

if not os.path.exists(audio_url):
raise FileNotFoundError(f"Audio file not found: {audio_url}")

file_format = self._get_audio_format(audio_url)

if file_format in ["silk", "amr"]:
temp_dir = get_astrbot_temp_path()
output_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.wav")

logger.info(f"Converting {file_format} file to wav for GLM-ASR...")
if file_format == "silk":
await tencent_silk_to_wav(audio_url, output_path)
elif file_format == "amr":
await convert_to_pcm_wav(audio_url, output_path)

audio_url = output_path

with open(audio_url, "rb") as f:
audio_base64 = base64.b64encode(f.read()).decode("utf-8")

payload = {
"model": self.model_name,
"file_base64": audio_base64,
}

try:
async with aiohttp.ClientSession() as session:
async with session.post(
self.api_base,
headers=headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=self.timeout),
) as response:
if response.status != 200:
error_text = await response.text()
logger.error(f"GLM-ASR API error: {response.status}, body: {error_text}")
response.raise_for_status()

result = await response.json()

if result.get("error"):
error_msg = result["error"].get("message", "Unknown error")
raise Exception(f"GLM-ASR API error: {error_msg}")

text = result.get("text", "")
return text

except aiohttp.ClientError as e:
raise Exception(f"GLM-ASR API request failed: {e!s}")
finally:
if output_path and os.path.exists(output_path):
try:
os.remove(output_path)
except Exception as e:
logger.warning(f"Failed to remove temp file {output_path}: {e}")
if audio_url.endswith(".input") and os.path.exists(audio_url):
try:
os.remove(audio_url)
except Exception as e:
logger.warning(f"Failed to remove temp file {audio_url}: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

当前的临时文件清理逻辑存在缺陷。当处理一个从网络下载并且需要格式转换(例如从 silk 转为 wav)的音频文件时,原先下载的临时文件(.input 文件)在操作结束后不会被删除,这会导致资源泄漏。原因是 audio_url 变量被重新赋值,导致最初下载的文件路径丢失。

为了解决这个问题,建议重构 get_text 方法,以更可靠地追踪和清理所有创建的临时文件。

    async def get_text(self, audio_url: str) -> str:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        temp_files_to_clean = []
        try:
            processing_path = audio_url
            if processing_path.startswith("http"):
                temp_dir = get_astrbot_temp_path()
                local_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.input")
                await download_file(processing_path, local_path)
                processing_path = local_path
                temp_files_to_clean.append(local_path)

            if not os.path.exists(processing_path):
                raise FileNotFoundError(f"Audio file not found: {processing_path}")

            file_format = self._get_audio_format(processing_path)

            if file_format in ["silk", "amr"]:
                temp_dir = get_astrbot_temp_path()
                output_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.wav")
                temp_files_to_clean.append(output_path)

                logger.info(f"Converting {file_format} file to wav for GLM-ASR...")
                if file_format == "silk":
                    await tencent_silk_to_wav(processing_path, output_path)
                elif file_format == "amr":
                    await convert_to_pcm_wav(processing_path, output_path)

                processing_path = output_path

            with open(processing_path, "rb") as f:
                audio_base64 = base64.b64encode(f.read()).decode("utf-8")

            payload = {
                "model": self.model_name,
                "file_base64": audio_base64,
            }

            async with aiohttp.ClientSession() as session:
                async with session.post(
                    self.api_base,
                    headers=headers,
                    json=payload,
                    timeout=aiohttp.ClientTimeout(total=self.timeout),
                ) as response:
                    if response.status != 200:
                        error_text = await response.text()
                        logger.error(f"GLM-ASR API error: {response.status}, body: {error_text}")
                        response.raise_for_status()

                    result = await response.json()

                    if result.get("error"):
                        error_msg = result["error"].get("message", "Unknown error")
                        raise Exception(f"GLM-ASR API error: {error_msg}")

                    text = result.get("text", "")
                    return text

        except aiohttp.ClientError as e:
            raise Exception(f"GLM-ASR API request failed: {e!s}")
        finally:
            for file_path in temp_files_to_clean:
                if os.path.exists(file_path):
                    try:
                        os.remove(file_path)
                    except Exception as e:
                        logger.warning(f"Failed to remove temp file {file_path}: {e}")

Comment on lines +28 to +29
self.speed: float = provider_config.get("glm_tts_speed", 1.0)
self.volume: float = provider_config.get("glm_tts_volume", 1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

从配置中获取的 speedvolume 参数缺少范围校验。根据 API 文档,speed 的有效范围是 [0.5, 2.0]volume 的有效范围是 (0, 10]。如果用户配置了超出范围的值,可能会导致 API 调用失败。

建议在 __init__ 方法中对这些值进行校验和修正,以增强代码的健壮性。

Suggested change
self.speed: float = provider_config.get("glm_tts_speed", 1.0)
self.volume: float = provider_config.get("glm_tts_volume", 1.0)
self.speed: float = max(0.5, min(2.0, provider_config.get("glm_tts_speed", 1.0)))
self.volume: float = max(0.1, min(10.0, provider_config.get("glm_tts_volume", 1.0)))

@xwsjjctz
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In ProviderGLMTTS.get_audio, the strict response.content_type != "audio/wav" check may reject valid responses (e.g., audio/wave or audio/x-wav); consider relaxing this to a prefix or startswith("audio/") check or inspecting the Content-Type header more flexibly.
  • For both GLM-ASR and GLM-TTS providers, when the API returns a non-2xx status or an unexpected content type, capturing and logging the response body (in addition to raise_for_status) would make debugging provider/API issues much easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ProviderGLMTTS.get_audio`, the strict `response.content_type != "audio/wav"` check may reject valid responses (e.g., `audio/wave` or `audio/x-wav`); consider relaxing this to a prefix or `startswith("audio/")` check or inspecting the `Content-Type` header more flexibly.
- For both GLM-ASR and GLM-TTS providers, when the API returns a non-2xx status or an unexpected content type, capturing and logging the response body (in addition to `raise_for_status`) would make debugging provider/API issues much easier.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/glm_asr_source.py" line_range="100" />
<code_context>
+        }
+
+        try:
+            async with aiohttp.ClientSession() as session:
+                async with session.post(
+                    self.api_base,
</code_context>
<issue_to_address>
**suggestion (performance):** Consider reusing an aiohttp.ClientSession instead of creating one per request to reduce connection overhead.

Creating a new ClientSession on every `get_text` call adds avoidable connection setup/teardown overhead, especially under frequent or concurrent use. Since the provider is longer-lived, consider a lazily initialized shared session on the instance and close it in `terminate` to reduce latency and socket churn.

Suggested implementation:

```python
        try:
            if not hasattr(self, "_session") or self._session is None or self._session.closed:
                # Lazily initialize a shared aiohttp ClientSession to avoid
                # per-request creation/teardown overhead.
                self._session = aiohttp.ClientSession()

            session = self._session
            async with session.post(
                    self.api_base,
                    headers=headers,
                    json=payload,
                    timeout=aiohttp.ClientTimeout(total=self.timeout),
                ) as response:
                    if response.status != 200:
                        error_text = await response.text()
                        logger.error(
                            f"GLM-ASR API error: {response.status}, body: {error_text}"
                        )
                        response.raise_for_status()

```

To fully implement the shared-session approach, you should also:
1. Initialize the attribute in the provider's constructor, e.g. in `__init__` of the GLM-ASR provider class:
   - `self._session: Optional[aiohttp.ClientSession] = None`
2. Ensure the session is closed when the provider is torn down, e.g. in a `terminate`/`close`/`__aexit__` method:
   - 
   ```python
   if hasattr(self, "_session") and self._session is not None and not self._session.closed:
       await self._session.close()
   ```
3. If your base provider class already defines lifecycle hooks, wire this cleanup into those hooks instead of creating a new one, to match the existing conventions in the codebase.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant