Skip to content

Using openvino-quantized embedder and reranker from huggingface hub #3493

@chtanch

Description

@chtanch

For my application (which relies on openvino model server), I would like to use openvino-quantized models from huggingface hub and avoid doing the quantization step myself.

For chat llm models, eg, OpenVINO/Qwen2.5-7B-Instruct-int4-ov, I'll need an additional graph.pbtxt for ovms to work. It seems that I can use the same graph.pbtxt for all models, so I can include a pre-generated graph.pbtxt .

However, for embedder (and reranker) models, eg, OpenVINO/bge-base-en-v1.5-int8-ov, I'll need to include graph.pbtxt, openvino_detokenizer.bin, and openvino_detokenizer.xml. The tokenizer files seem to be model-dependent, so using pregenerated files is not reliable.

Is there a solution for using openvino-quantized embedder/reranker models from huggingface hub? Or do I have to quantize base models (eg, BAAI/bge-base-en-v1.5) myself with export_model.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions