feat: 支持可配置的混合检索融合权重#2071
Merged
huanghuoguoguo merged 2 commits intolangbot-app:masterfrom Mar 24, 2026
Merged
Conversation
Member
|
@huanghuoguoguo review |
| coll.hybrid_search, | ||
| query=query_cfg, | ||
| knn=knn_cfg, | ||
| rank={'rrf': {}}, |
Collaborator
There was a problem hiding this comment.
看一下seekdb的官方文档,是这样调用的吗
Collaborator
There was a problem hiding this comment.
来自opus
RRF 权重调节(pyseekdb hybrid_search)
pyseekdb 的 hybrid_search 通过 rank(RRF 全局配置) 和 boost(单路召回权重) 两个维度控制权重,最终生成 JSON 传给 OceanBase 的 DBMS_HYBRID_SEARCH.GET_SQL,核心规则如下:
1. rank 参数 — RRF 全局配置
用于定义 RRF 融合的基础规则,格式:
rank={"rrf": {"rank_window_size": 60, "rank_constant": 60}}rank_constant(RRF 公式中的 k):RRF 得分 = 1 / (k + rank)。k 越大,排名靠后的结果衰减越慢,各路召回结果差异越“拉平”,默认 60。rank_window_size:参与 RRF 融合的每路结果窗口大小。- 传空
rank={"rrf": {}}表示使用 OceanBase 默认 RRF 参数。
2. boost 参数 — 单路召回权重
调节全文检索(BM25)和向量检索的相对重要性,核心示例:
results = collection.hybrid_search(
query={
"where_document": {"$contains": "机器学习"},
"boost": 0.3, # 全文检索路的权重
},
knn={
"query_texts": ["AI研究"],
"boost": 0.7, # 向量检索路的权重
"n_results": 20,
},
rank={"rrf": {"rank_constant": 60, "rank_window_size": 100}},
n_results=10,
)query.boost:全文检索(BM25)路的权重(最终加到 query_string.boost);knn.boost:向量检索路的权重(直接设到 knn 表达式的 boost 字段)。
3. 核心逻辑说明
- rank 参数:pyseekdb 直接透传,OceanBase 引擎执行 RRF 融合计算;
- boost 参数:控制单路召回得分的缩放权重,是调节全文/向量检索优先级的核心。
参数汇总表
| 参数 | 作用域 | 核心作用 |
|---|---|---|
| rank.rrf.rank_constant | 全局 | RRF 公式的 k 值,越大越平滑 |
| rank.rrf.rank_window_size | 全局 | 每路参与融合的结果窗口大小 |
| query.boost | 全文检索路 | 全文检索得分的缩放权重 |
| knn.boost | 向量检索路 | 向量检索得分的缩放权重 |
权重调节建议
- 想侧重语义相似度:调高
knn.boost; - 想侧重关键词匹配:调高
query.boost。
总结
- RRF 融合的全局规则由
rank参数控制,核心是rank_constant(平滑度)和rank_window_size(融合窗口); - 全文/向量检索的相对权重由
boost控制,是业务侧调节召回优先级的核心; - pyseekdb 仅透传参数,最终 RRF 计算由 OceanBase 引擎完成。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
变更说明
验证
说明