Skip to content

Commit e9c12e3

Browse files
authored
Phi-4 basic inference with native/vllm (#1563)
1 parent 570afc2 commit e9c12e3

File tree

3 files changed

+66
-2
lines changed

3 files changed

+66
-2
lines changed
Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
1-
# Phi-4-multimodal-instruct
1+
# **Phi-4-multimodal-instruct 5.6B**
22

3-
Configs for Phi-4-multimodal-instruct 5.6Β model. See https://huggingface.co/microsoft/Phi-4-multimodal-instruct
3+
Configs for Phi-4-multimodal-instruct 5.6Β model.
4+
🔗 **Reference:** [Phi-4-multimodal-instruct on Hugging Face](https://huggingface.co/microsoft/Phi-4-multimodal-instruct)
5+
6+
---
47

58
This is a multimodal model that combines text, visual, and audio inputs.
69
It uses a "Mixture of LoRAs" approach, allowing you to plug in adapters for each
@@ -9,3 +12,8 @@ reading the following:
912

1013
- [Mixture-of-LoRAs](https://arxiv.org/abs/2403.03432)
1114
- [Phi-4 Multimodal Technical Report](https://arxiv.org/abs/2503.01743)
15+
16+
⚠️ This model requires `flash attention 2`. Run the following if executing in a custom fashion:
17+
```sh
18+
pip install -U flash-attn --no-build-isolation
19+
```
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Phi-4-multimodal-instruct 5.6B inference config.
2+
#
3+
# Requirements:
4+
# - Run `pip install -U flash-attn --no-build-isolation`
5+
#
6+
# Usage:
7+
# oumi infer -i -c configs/recipes/vision/phi4/inference/infer.yaml \
8+
# --image "tests/testdata/images/the_great_wave_off_kanagawa.jpg"
9+
#
10+
#
11+
# See Also:
12+
# - Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html
13+
# - Config class: oumi.core.configs.InferenceConfig
14+
# - Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py
15+
# - Other inference configs: configs/**/inference/
16+
17+
model:
18+
model_name: "microsoft/Phi-4-multimodal-instruct"
19+
torch_dtype_str: "bfloat16"
20+
model_max_length: 4096
21+
trust_remote_code: True
22+
attn_implementation: "flash_attention_2" # The model requires Flash Attention.
23+
24+
generation:
25+
max_new_tokens: 64
26+
batch_size: 1
27+
28+
engine: NATIVE
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Phi-4-multimodal-instruct 5.6B vLLM inference config.
2+
#
3+
# Requirements:
4+
# - Run `pip install vllm`
5+
# - Run `pip install -U flash-attn --no-build-isolation`
6+
#
7+
# Usage:
8+
# oumi infer -i -c configs/recipes/vision/phi4/inference/vllm_infer.yaml \
9+
# --image "tests/testdata/images/the_great_wave_off_kanagawa.jpg"
10+
#
11+
# See Also:
12+
# - Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html
13+
# - Config class: oumi.core.configs.InferenceConfig
14+
# - Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py
15+
# - Other inference configs: configs/**/inference/
16+
17+
model:
18+
model_name: "microsoft/Phi-4-multimodal-instruct"
19+
torch_dtype_str: "bfloat16"
20+
model_max_length: 4096
21+
trust_remote_code: True
22+
attn_implementation: "flash_attention_2" # The model requires Flash Attention.
23+
24+
generation:
25+
max_new_tokens: 64
26+
batch_size: 1
27+
28+
engine: VLLM

0 commit comments

Comments
 (0)