Using openvino-quantized embedder and reranker from huggingface hub

For my application (which relies on openvino model server), I would like to use openvino-quantized models from huggingface hub and avoid doing the quantization step myself.

For chat llm models, eg, `OpenVINO/Qwen2.5-7B-Instruct-int4-ov`, I'll need an additional graph.pbtxt for ovms to work. It seems that I can use the same graph.pbtxt for all models, so I can include a pre-generated graph.pbtxt .

However, for embedder (and reranker) models, eg, `OpenVINO/bge-base-en-v1.5-int8-ov`, I'll need to include graph.pbtxt, openvino_detokenizer.bin, and openvino_detokenizer.xml. The tokenizer files seem to be model-dependent, so using pregenerated files is not reliable.

Is there a solution for using openvino-quantized embedder/reranker models from huggingface hub? Or do I have to quantize base models (eg, `BAAI/bge-base-en-v1.5`) myself with `export_model.py`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using openvino-quantized embedder and reranker from huggingface hub #3493

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using openvino-quantized embedder and reranker from huggingface hub #3493

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions