feat(provider): add GLM-ASR and GLM-TTS providers#6603
feat(provider): add GLM-ASR and GLM-TTS providers#6603xwsjjctz wants to merge 4 commits intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the bot's voice interaction features by integrating new Speech-to-Text (ASR) and Text-to-Speech (TTS) capabilities powered by GLM models. It introduces the necessary backend logic for processing audio inputs and generating speech outputs, along with corresponding configuration and UI updates to support these new providers. The changes aim to provide more robust and flexible voice interaction options for users. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
ProviderGLMASR.get_text,audio_urlis reused for both the downloaded.inputfile and the converted.wav, so thefinallyblock never deletes the downloaded temp file when a conversion occurs; keep the original download path in a separate variable so both temporary files can be cleaned up correctly. - Both
ProviderGLMASRandProviderGLMTTSproceed even ifapi_keyis empty; consider validating configuration in__init__(or before the first request) and raising a clear error when required fields likeapi_keyare missing to avoid confusing runtime failures.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `ProviderGLMASR.get_text`, `audio_url` is reused for both the downloaded `.input` file and the converted `.wav`, so the `finally` block never deletes the downloaded temp file when a conversion occurs; keep the original download path in a separate variable so both temporary files can be cleaned up correctly.
- Both `ProviderGLMASR` and `ProviderGLMTTS` proceed even if `api_key` is empty; consider validating configuration in `__init__` (or before the first request) and raising a clear error when required fields like `api_key` are missing to avoid confusing runtime failures.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
这个 PR 添加了对智谱(BigModel)的 GLM-ASR (语音转文本) 和 GLM-TTS (文本转语音) 服务的支持,包括核心 provider 实现、默认配置、动态导入和前端 UI 的适配。整体实现很完整。
我发现了一些可以改进的地方:
- 在
glm_asr_source.py中,当处理来自远程 URL 的音频文件时,存在一个潜在的资源泄漏问题。如果下载的临时文件需要被转换格式,原始的下载文件在处理结束后不会被删除。 - 在
glm_tts_source.py中,对于从配置中获取的speed和volume参数,缺少范围校验,这可能会导致向 API 发送无效值。
我已经针对这些问题提出了具体的代码修改建议。修复这些问题可以提高代码的健壮性和资源的利用效率。
| async def get_text(self, audio_url: str) -> str: | ||
| headers = { | ||
| "Authorization": f"Bearer {self.api_key}", | ||
| "Content-Type": "application/json", | ||
| } | ||
|
|
||
| output_path = None | ||
|
|
||
| if audio_url.startswith("http"): | ||
| temp_dir = get_astrbot_temp_path() | ||
| local_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.input") | ||
| await download_file(audio_url, local_path) | ||
| audio_url = local_path | ||
|
|
||
| if not os.path.exists(audio_url): | ||
| raise FileNotFoundError(f"Audio file not found: {audio_url}") | ||
|
|
||
| file_format = self._get_audio_format(audio_url) | ||
|
|
||
| if file_format in ["silk", "amr"]: | ||
| temp_dir = get_astrbot_temp_path() | ||
| output_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.wav") | ||
|
|
||
| logger.info(f"Converting {file_format} file to wav for GLM-ASR...") | ||
| if file_format == "silk": | ||
| await tencent_silk_to_wav(audio_url, output_path) | ||
| elif file_format == "amr": | ||
| await convert_to_pcm_wav(audio_url, output_path) | ||
|
|
||
| audio_url = output_path | ||
|
|
||
| with open(audio_url, "rb") as f: | ||
| audio_base64 = base64.b64encode(f.read()).decode("utf-8") | ||
|
|
||
| payload = { | ||
| "model": self.model_name, | ||
| "file_base64": audio_base64, | ||
| } | ||
|
|
||
| try: | ||
| async with aiohttp.ClientSession() as session: | ||
| async with session.post( | ||
| self.api_base, | ||
| headers=headers, | ||
| json=payload, | ||
| timeout=aiohttp.ClientTimeout(total=self.timeout), | ||
| ) as response: | ||
| if response.status != 200: | ||
| error_text = await response.text() | ||
| logger.error(f"GLM-ASR API error: {response.status}, body: {error_text}") | ||
| response.raise_for_status() | ||
|
|
||
| result = await response.json() | ||
|
|
||
| if result.get("error"): | ||
| error_msg = result["error"].get("message", "Unknown error") | ||
| raise Exception(f"GLM-ASR API error: {error_msg}") | ||
|
|
||
| text = result.get("text", "") | ||
| return text | ||
|
|
||
| except aiohttp.ClientError as e: | ||
| raise Exception(f"GLM-ASR API request failed: {e!s}") | ||
| finally: | ||
| if output_path and os.path.exists(output_path): | ||
| try: | ||
| os.remove(output_path) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to remove temp file {output_path}: {e}") | ||
| if audio_url.endswith(".input") and os.path.exists(audio_url): | ||
| try: | ||
| os.remove(audio_url) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to remove temp file {audio_url}: {e}") |
There was a problem hiding this comment.
当前的临时文件清理逻辑存在缺陷。当处理一个从网络下载并且需要格式转换(例如从 silk 转为 wav)的音频文件时,原先下载的临时文件(.input 文件)在操作结束后不会被删除,这会导致资源泄漏。原因是 audio_url 变量被重新赋值,导致最初下载的文件路径丢失。
为了解决这个问题,建议重构 get_text 方法,以更可靠地追踪和清理所有创建的临时文件。
async def get_text(self, audio_url: str) -> str:
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
temp_files_to_clean = []
try:
processing_path = audio_url
if processing_path.startswith("http"):
temp_dir = get_astrbot_temp_path()
local_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.input")
await download_file(processing_path, local_path)
processing_path = local_path
temp_files_to_clean.append(local_path)
if not os.path.exists(processing_path):
raise FileNotFoundError(f"Audio file not found: {processing_path}")
file_format = self._get_audio_format(processing_path)
if file_format in ["silk", "amr"]:
temp_dir = get_astrbot_temp_path()
output_path = os.path.join(temp_dir, f"glm_asr_{uuid.uuid4().hex[:8]}.wav")
temp_files_to_clean.append(output_path)
logger.info(f"Converting {file_format} file to wav for GLM-ASR...")
if file_format == "silk":
await tencent_silk_to_wav(processing_path, output_path)
elif file_format == "amr":
await convert_to_pcm_wav(processing_path, output_path)
processing_path = output_path
with open(processing_path, "rb") as f:
audio_base64 = base64.b64encode(f.read()).decode("utf-8")
payload = {
"model": self.model_name,
"file_base64": audio_base64,
}
async with aiohttp.ClientSession() as session:
async with session.post(
self.api_base,
headers=headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=self.timeout),
) as response:
if response.status != 200:
error_text = await response.text()
logger.error(f"GLM-ASR API error: {response.status}, body: {error_text}")
response.raise_for_status()
result = await response.json()
if result.get("error"):
error_msg = result["error"].get("message", "Unknown error")
raise Exception(f"GLM-ASR API error: {error_msg}")
text = result.get("text", "")
return text
except aiohttp.ClientError as e:
raise Exception(f"GLM-ASR API request failed: {e!s}")
finally:
for file_path in temp_files_to_clean:
if os.path.exists(file_path):
try:
os.remove(file_path)
except Exception as e:
logger.warning(f"Failed to remove temp file {file_path}: {e}")| self.speed: float = provider_config.get("glm_tts_speed", 1.0) | ||
| self.volume: float = provider_config.get("glm_tts_volume", 1.0) |
There was a problem hiding this comment.
从配置中获取的 speed 和 volume 参数缺少范围校验。根据 API 文档,speed 的有效范围是 [0.5, 2.0],volume 的有效范围是 (0, 10]。如果用户配置了超出范围的值,可能会导致 API 调用失败。
建议在 __init__ 方法中对这些值进行校验和修正,以增强代码的健壮性。
| self.speed: float = provider_config.get("glm_tts_speed", 1.0) | |
| self.volume: float = provider_config.get("glm_tts_volume", 1.0) | |
| self.speed: float = max(0.5, min(2.0, provider_config.get("glm_tts_speed", 1.0))) | |
| self.volume: float = max(0.1, min(10.0, provider_config.get("glm_tts_volume", 1.0))) |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
ProviderGLMTTS.get_audio, the strictresponse.content_type != "audio/wav"check may reject valid responses (e.g.,audio/waveoraudio/x-wav); consider relaxing this to a prefix orstartswith("audio/")check or inspecting theContent-Typeheader more flexibly. - For both GLM-ASR and GLM-TTS providers, when the API returns a non-2xx status or an unexpected content type, capturing and logging the response body (in addition to
raise_for_status) would make debugging provider/API issues much easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `ProviderGLMTTS.get_audio`, the strict `response.content_type != "audio/wav"` check may reject valid responses (e.g., `audio/wave` or `audio/x-wav`); consider relaxing this to a prefix or `startswith("audio/")` check or inspecting the `Content-Type` header more flexibly.
- For both GLM-ASR and GLM-TTS providers, when the API returns a non-2xx status or an unexpected content type, capturing and logging the response body (in addition to `raise_for_status`) would make debugging provider/API issues much easier.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/glm_asr_source.py" line_range="100" />
<code_context>
+ }
+
+ try:
+ async with aiohttp.ClientSession() as session:
+ async with session.post(
+ self.api_base,
</code_context>
<issue_to_address>
**suggestion (performance):** Consider reusing an aiohttp.ClientSession instead of creating one per request to reduce connection overhead.
Creating a new ClientSession on every `get_text` call adds avoidable connection setup/teardown overhead, especially under frequent or concurrent use. Since the provider is longer-lived, consider a lazily initialized shared session on the instance and close it in `terminate` to reduce latency and socket churn.
Suggested implementation:
```python
try:
if not hasattr(self, "_session") or self._session is None or self._session.closed:
# Lazily initialize a shared aiohttp ClientSession to avoid
# per-request creation/teardown overhead.
self._session = aiohttp.ClientSession()
session = self._session
async with session.post(
self.api_base,
headers=headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=self.timeout),
) as response:
if response.status != 200:
error_text = await response.text()
logger.error(
f"GLM-ASR API error: {response.status}, body: {error_text}"
)
response.raise_for_status()
```
To fully implement the shared-session approach, you should also:
1. Initialize the attribute in the provider's constructor, e.g. in `__init__` of the GLM-ASR provider class:
- `self._session: Optional[aiohttp.ClientSession] = None`
2. Ensure the session is closed when the provider is torn down, e.g. in a `terminate`/`close`/`__aexit__` method:
-
```python
if hasattr(self, "_session") and self._session is not None and not self._session.closed:
await self._session.close()
```
3. If your base provider class already defines lifecycle hooks, wire this cleanup into those hooks instead of creating a new one, to match the existing conventions in the codebase.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
添加了bigmodel的语音转文本和文本转语音服务。
Modifications / 改动点
新增以下两个文件:
astrbot/core/provider/sources/glm_asr_source.py
astrbot/core/provider/sources/glm_tts_source.py
astrbot/core/provider/manager.py - 添加动态导入
astrbot/core/config/default.py - 添加默认配置
dashboard/src/composables/useProviderSources.ts - 添加类型映射
dashboard/src/i18n/locales/en-US/features/config-metadata.json - i18n
dashboard/src/i18n/locales/zh-CN/features/config-metadata.json - i18n
astrbot/core/message/components.py - 添加了url字段,QQ适配器调用 Record.fromURL(url) 这里没有,STT处理会被跳过
Screenshots or Test Results / 运行截图或测试结果
测试结果:

日志:
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add BigModel GLM-based speech-to-text and text-to-speech providers and wire them into core configuration and the dashboard, along with minor message component adjustments for audio URLs.
New Features:
Enhancements:
Documentation: