AISBench常规命令调用方式是通过--models指定模型任务,通过--datasets指定数据集任务,通过--summarizer指定结果呈现任务来绝对运行的测评任务,AISBench同样也支持指定自定义的配置文件将这三类任务对应的配置文件信息组合在一起,从而实现自定义的任务组合运行。
ais_bench ais_bench/configs/{模型类型}_examples/{任务配置文件名}
# 示例:
ais_bench ais_bench/configs/api_examples/infer_vllm_api_general.py以下示例展示如何同时评测两个服务接口(v1/chat/completions 与 v1/completions)在 GSM8K 与 MATH数据集上的表现。参考示例:demo_infer_vllm_api.py:
from mmengine.config import read_base
from ais_bench.benchmark.partitioners import NaivePartitioner
from ais_bench.benchmark.runners.local_api import LocalAPIRunner
from ais_bench.benchmark.tasks import OpenICLInferTask
from ais_bench.benchmark.models import VLLMCustomAPIChat
with read_base():
from ais_bench.benchmark.configs.summarizers.example import summarizer
from ais_bench.benchmark.configs.datasets.gsm8k.gsm8k_gen_0_shot_cot_str import gsm8k_datasets as gsm8k_0_shot_cot_str
from ais_bench.benchmark.configs.datasets.math.math500_gen_0_shot_cot_chat_prompt import math_datasets as math500_gen_0_shot_cot_chat
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general import models as vllm_api_general
# 只取部分样本进行 demo 测试
gsm8k_0_shot_cot_str[0]['abbr'] = 'demo_' + gsm8k_0_shot_cot_str[0]['abbr']
gsm8k_0_shot_cot_str[0]['reader_cfg']['test_range'] = '[0:8]'
math500_gen_0_shot_cot_chat[0]['abbr'] = 'demo_' + math500_gen_0_shot_cot_chat[0]['abbr']
math500_gen_0_shot_cot_chat[0]['reader_cfg']['test_range'] = '[0:8]'
datasets = gsm8k_0_shot_cot_str + math500_gen_0_shot_cot_chat # 指定数据集列表,可通过累加添加不同的数据集配置
models = [ # 指定模型配置列表
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr='demo-vllm-api-general-chat',
path="",
model="",
request_rate = 0,
retry = 2,
host_ip = "localhost", # 指定推理服务的IP
host_port = 8080, # 指定推理服务的端口
max_out_len = 512,
batch_size=1,
generation_kwargs = dict(
temperature = 0.5,
top_k = 10,
top_p = 0.95,
seed = None,
repetition_penalty = 1.03,
)
)
]
work_dir = 'outputs/demo_api-vllm-general-chat/'修改好配置文件后,执行如下命令启动精度评测:
ais_bench ais_bench/configs/api_examples/demo_infer_vllm_api_general_chat.py如果需要执行多任务并行,可以在命令行中添加 --max-num-workers参数指定最大任务并行数,示例如下:
ais_bench ais_bench/configs/api_examples/demo_infer_vllm_api_general_chat.py --max-num-workers 4dataset version metric mode demo-vllm-api-general-chat demo-vllm-api-general
----------------------- -------- -------- ----- -------------------------- ---------------------
demo_gsm8k 401e4c accuracy gen 62.50 62.50
demo_math_prm800k_500 c4b6f0 accuracy gen 50.00 62.50以下示例展示如何同时评测两个服务接口(v1/chat/completions 与 v1/completions)使用合成数据集进行性能测评的表现。参考示例:demo_infer_vllm_api_perf.py:
from mmengine.config import read_base
with read_base():
from ais_bench.benchmark.configs.summarizers.example import summarizer
from ais_bench.benchmark.configs.datasets.synthetic.synthetic_gen_string import (
synthetic_datasets,
)
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general_stream import (
models as vllm_api_general_stream,
)
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_stream_chat import (
models as vllm_api_stream_chat,
)
datasets = synthetic_datasets # 指定数据集列表
vllm_api_general_stream[0]["abbr"] = "demo-" + vllm_api_general_stream[0]["abbr"]
vllm_api_stream_chat[0]["abbr"] = "demo-" + vllm_api_stream_chat[0]["abbr"]
models = vllm_api_general_stream + vllm_api_stream_chat # 指定模型列表
work_dir = "outputs/demo_api-vllm-stream-perf/"修改好配置文件后,执行如下命令启动性能评测:
ais_bench ais_bench/configs/api_examples/demo_infer_vllm_api_perf.py -m perf如果需要执行多任务并行,可以在命令行中添加 --max-num-workers参数指定最大任务并行数,示例如下:
ais_bench ais_bench/configs/api_examples/demo_infer_vllm_api_perf.py -m perf --max-num-workers 2[2025-12-05 12:10:44,147] [ais_bench] [INFO] Performance Results of task [demo-vllm-api-general-stream/syntheticdataset]:
╒══════════════════════════╤═════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╤═════╕
│ Performance Parameters │ Stage │ Average │ Min │ Max │ Median │ P75 │ P90 │ P99 │ N │
╞══════════════════════════╪═════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╪═════╡
│ E2EL │ total │ 1734.3 ms │ 544.8 ms │ 3692.3 ms │ 1664.0 ms │ 2081.5 ms │ 2748.4 ms │ 3597.9 ms │ 10 │
├──────────────────────────┼─────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────┤
│ TTFT │ total │ 103.5 ms │ 102.4 ms │ 107.0 ms │ 103.1 ms │ 103.3 ms │ 104.2 ms │ 106.8 ms │ 10 │
...
[2025-12-05 12:10:44,149] [ais_bench] [INFO] Performance Result files located in outputs/demo_api-vllm-general-stream-chat-perf/20251205_121020/performances/demo-vllm-api-general-stream-chat.
[2025-12-05 12:10:44,149] [ais_bench] [INFO] Performance Results of task [demo-vllm-api-stream-chat/syntheticdataset]:
╒══════════════════════════╤═════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╤════════════════╤═════════════════╤═════════════════╤═════╕
│ Performance Parameters │ Stage │ Average │ Min │ Max │ Median │ P75 │ P90 │ P99 │ N │
╞══════════════════════════╪═════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╪════════════════╪═════════════════╪═════════════════╪═════╡
│ E2EL │ total │ 3406.7 ms │ 372.4 ms │ 5772.4 ms │ 3589.8 ms │ 4476.6 ms │ 4921.1 ms │ 5647.1 ms │ 10 │
├──────────────────────────┼─────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────────┼─────────────────┼─────────────────┼─────┤
│ TTFT │ total │ 103.2 ms │ 102.0 ms │ 107.5 ms │ 102.9 ms │ 103.4 ms │ 104.3 ms │ 107.2 ms │ 10 │默认情况下,自定义配置文件中的模型与数据集组合会自动根据模型配置文件中的models列表和数据集配置文件中的datasets列表进行笛卡尔组合,组合数量为模型配置文件中的models列表长度与数据集配置文件中的datasets列表长度之积。用户可以通过在配置文件中配置model_dataset_combinations自定义模型数据集组合。
from mmengine.config import read_base
with read_base():
from ais_bench.benchmark.configs.summarizers.example import summarizer
from ais_bench.benchmark.configs.datasets.gsm8k.gsm8k_gen_0_shot_cot_str import gsm8k_datasets as gsm8k_0_shot_cot_str
from ais_bench.benchmark.configs.datasets.math.math500_gen_0_shot_cot_chat_prompt import math_datasets as math500_gen_0_shot_cot_chat
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general import models as vllm_api_general
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general_chat import models as vllm_api_general_chat
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_stream_chat import models as vllm_api_stream_chat
models = vllm_api_general + vllm_api_general_chat + vllm_api_stream_chat
datasets = gsm8k_0_shot_cot_str + math500_gen_0_shot_cot_chat
model_dataset_combinations = [
dict(models=[models[0]], datasets=[datasets[0]]), # 组合1,使用模型0(vllm_api_general)与数据集0(gsm8k_0_shot_cot_str)进行组合
dict(models=[models[1]], datasets=[datasets[1]]), # 组合2,使用模型1(vllm_api_general_chat)与数据集1(math500_gen_0_shot_cot_chat)进行组合
dict(models=[models[2]], datasets=[datasets[0], datasets[1]]), # 组合3,使用模型2(vllm_api_stream_chat)与数据集0(gsm8k_0_shot_cot_str)和数据集1(math500_gen_0_shot_cot_chat)进行组合
...
]
⚠️ 注意:需要用abbr参数指定模型与数据集的唯一标识。同一配置文件中,相同abbr的模型与数据集只能组合一次。如下实例中,vllm_api_general_copy与vllm_api_general的abbr相同,所以会被认为与组合1是相同任务,会被跳过,即便内部参数存在区别:
from mmengine.config import read_base
with read_base():
from ais_bench.benchmark.configs.models.vllm_api.vllm_api_general import models as vllm_api_general
from ais_bench.benchmark.configs.datasets.math.math500_gen_0_shot_cot_chat_prompt import math_datasets as math500_gen_0_shot_cot_chat
vllm_api_general_copy = vllm_api_general.copy()
vllm_api_general_copy[0]['port'] = 8081
models = vllm_api_general_copy + vllm_api_general
datasets = math500_gen_0_shot_cot_chat
model_dataset_combinations = [
dict(models=[models[1]], datasets=datasets), # 组合1,使用模型1(vllm_api_general)与数据集(math500_gen_0_shot_cot_chat)进行组合
dict(models=[models[0]], datasets=datasets), # 组合2,使用模型0(vllm_api_general_copy)与数据集0(math500_gen_0_shot_cot_chat)进行组合,由于vllm_api_general_copy与vllm_api_general的abbr相同,所以会被认为与组合1是相同任务,会被跳过,即便内部参数存在区别
]正确做法:在对模型或数据集配置进行复用时,修改abbr参数,使其与原模型或数据集不同,例如:
vllm_api_general_copy = vllm_api_general.copy()
vllm_api_general_copy[0]['abbr'] = vllm_api_general[0]['abbr'] + '-copy' # 修改abbr,标识模型这样vllm_api_general_copy[0]与vllm_api_general[0]的abbr不同,组合2与组合1是不同任务,会被正常执行。
| 文件名 | 简介 |
|---|---|
| infer_vllm_api_general.py | 基于gsm8k数据集使用vllm api(0.6+版本)访问v1/completions子服务进行评测,prompt格式为字符串格式,自定义了数据集路径 |
| infer_mindie_stream_api_general.py | 基于gsm8k数据集使用mindie stream api访问infer子服务进行评测,prompt格式为字符串格式,自定义了数据集路径 |
| infer_vllm_api_general_chat.py | 基于gsm8k数据集使用vllm api(0.6+版本)访问v1/chat/completions子服务进行评测,prompt格式为对话格式,自定义了数据集路径 |
| infer_vllm_api_stream_chat.py | 基于gsm8k数据集使用vllm api(0.6+版本)访问v1/chat/completions子服务使用流式推理进行评测,prompt格式为对话格式,自定义了数据集路径 |
| infer_hf_base_model.py | 基于gsm8k数据集使用huggingface base模型的推理接口进行评测,prompt格式为字符串格式,自定义了数据集路径 |
| infer_hf_chat_model.py | 基于gsm8k数据集使用huggingface chat模型的推理接口进行评测,prompt格式为字符串格式,自定义了数据集路径 |
注: 上述自定义配置文件如果要评测其他数据集,请从ais_bench/configs/api_examples/all_dataset_configs.py导入其他数据集。