Skip to content

triton_gpu中运行fp16 onnx模型报错 #2761

@lukeewin

Description

@lukeewin

1. 环境

显卡 3090Ti

2. 过程

通过funasr/runtime/triton_gpu中的readme文档构建docker镜像,然后启动镜像,进入容器,在执行export_onnx.py导出onnx模型,并且把export_onnx.py中的最下面的几行转换为fp16的代码逻辑的注释删除,然后执行转换,生成model_fp16.onnx模型,转换后修改/model_repo_sense_voice_small/encoder中的配置文件为下面样子:

name: "encoder"
backend: "onnxruntime"
default_model_filename: "model_fp16.onnx"

max_batch_size: 16

input [
  {
    name: "speech"
    data_type: TYPE_FP16
    dims: [-1, 560]
  },
  {
    name: "speech_lengths"
    data_type: TYPE_INT32
    dims: [1]
    reshape: { shape: [ ] }
  },
  {
    name: "language"
    data_type: TYPE_INT32
    dims: [1]
    reshape: { shape: [ ] }
  },
  {
    name: "textnorm"
    data_type: TYPE_INT32
    dims: [1]
    reshape: { shape: [ ] }
  }
]

output [
  {
    name: "ctc_logits"
    data_type: TYPE_FP16
    dims: [-1, 25055]
  },
  {
    name: "encoder_out_lens"
    data_type: TYPE_INT32
    dims: [1]
    reshape: { shape: [ ] }
  }
]

dynamic_batching {
  max_queue_delay_microseconds: 1000
  }
parameters { key: "cudnn_conv_algo_search" value: { string_value: "2" } }

instance_group [
    {
      count: 1
      kind: KIND_GPU
    }
]

运行容器中的run.sh脚本,报下面的错误。

Image

请问如果想要使用fp16的onnx模型,应该如何处理?fp32转换fp16是使用的docker内提供的export_onnx.py代码转换的,请问是这个代码存在问题吗?下面是export_onnx.py代码。

 from model import SenseVoiceSmall

    model_dir = "iic/SenseVoiceSmall"
    #model_dir = "./SenseVoiceSmall"
    model, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir)
    # model = model.to("cpu")
    model = export_rebuild_model(model, max_seq_len=512, device="cuda")
    # model.export()
    print("Export Done.")

    dummy_inputs = model.export_dummy_inputs()

    # Export the model
    torch.onnx.export(
        model,
        dummy_inputs,
        "model.onnx",
        input_names=model.export_input_names(),
        output_names=model.export_output_names(),
        dynamic_axes=model.export_dynamic_axes(),
        opset_version=18
    )
    import os
    import onnxmltools
    from onnxmltools.utils.float16_converter import (
        convert_float_to_float16)
    decoder_onnx_model = onnxmltools.utils.load_model("model.onnx")
    decoder_onnx_model = convert_float_to_float16(decoder_onnx_model)
    decoder_onnx_path = "model_fp16.onnx"
    onnxmltools.utils.save_model(decoder_onnx_model, decoder_onnx_path)
    print("Model has been successfully exported to model.onnx")

更具体的代码可以看 https://huggingface.co/yuekai/model_repo_sense_voice_small/blob/main/export_onnx.py

如果我不使用fp16,使用 https://huggingface.co/yuekai/model_repo_sense_voice_small/blob/main/export_onnx.py 这个代码导出的fp32的onnx也运行报错。如果使用 https://modelscope.cn/models/iic/SenseVoiceSmall-onnx/files 中的 model_quant.onnx 则是正常能用,请问这个是上面导出onnx代码存在问题导致的吗?
如果我想要使用fp16或者是int8的onnx,应该如何做?
Thanks
@yuekaizhang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions