File tree Expand file tree Collapse file tree 3 files changed +66
-2
lines changed
configs/recipes/vision/phi4 Expand file tree Collapse file tree 3 files changed +66
-2
lines changed Original file line number Diff line number Diff line change 1
- # Phi-4-multimodal-instruct
1
+ # ** Phi-4-multimodal-instruct 5.6B **
2
2
3
- Configs for Phi-4-multimodal-instruct 5.6Β model. See https://huggingface.co/microsoft/Phi-4-multimodal-instruct
3
+ Configs for Phi-4-multimodal-instruct 5.6Β model.
4
+ 🔗 ** Reference:** [ Phi-4-multimodal-instruct on Hugging Face] ( https://huggingface.co/microsoft/Phi-4-multimodal-instruct )
5
+
6
+ ---
4
7
5
8
This is a multimodal model that combines text, visual, and audio inputs.
6
9
It uses a "Mixture of LoRAs" approach, allowing you to plug in adapters for each
@@ -9,3 +12,8 @@ reading the following:
9
12
10
13
- [ Mixture-of-LoRAs] ( https://arxiv.org/abs/2403.03432 )
11
14
- [ Phi-4 Multimodal Technical Report] ( https://arxiv.org/abs/2503.01743 )
15
+
16
+ ⚠️ This model requires ` flash attention 2 ` . Run the following if executing in a custom fashion:
17
+ ``` sh
18
+ pip install -U flash-attn --no-build-isolation
19
+ ```
Original file line number Diff line number Diff line change
1
+ # Phi-4-multimodal-instruct 5.6B inference config.
2
+ #
3
+ # Requirements:
4
+ # - Run `pip install -U flash-attn --no-build-isolation`
5
+ #
6
+ # Usage:
7
+ # oumi infer -i -c configs/recipes/vision/phi4/inference/infer.yaml \
8
+ # --image "tests/testdata/images/the_great_wave_off_kanagawa.jpg"
9
+ #
10
+ #
11
+ # See Also:
12
+ # - Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html
13
+ # - Config class: oumi.core.configs.InferenceConfig
14
+ # - Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py
15
+ # - Other inference configs: configs/**/inference/
16
+
17
+ model :
18
+ model_name : " microsoft/Phi-4-multimodal-instruct"
19
+ torch_dtype_str : " bfloat16"
20
+ model_max_length : 4096
21
+ trust_remote_code : True
22
+ attn_implementation : " flash_attention_2" # The model requires Flash Attention.
23
+
24
+ generation :
25
+ max_new_tokens : 64
26
+ batch_size : 1
27
+
28
+ engine : NATIVE
Original file line number Diff line number Diff line change
1
+ # Phi-4-multimodal-instruct 5.6B vLLM inference config.
2
+ #
3
+ # Requirements:
4
+ # - Run `pip install vllm`
5
+ # - Run `pip install -U flash-attn --no-build-isolation`
6
+ #
7
+ # Usage:
8
+ # oumi infer -i -c configs/recipes/vision/phi4/inference/vllm_infer.yaml \
9
+ # --image "tests/testdata/images/the_great_wave_off_kanagawa.jpg"
10
+ #
11
+ # See Also:
12
+ # - Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html
13
+ # - Config class: oumi.core.configs.InferenceConfig
14
+ # - Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py
15
+ # - Other inference configs: configs/**/inference/
16
+
17
+ model :
18
+ model_name : " microsoft/Phi-4-multimodal-instruct"
19
+ torch_dtype_str : " bfloat16"
20
+ model_max_length : 4096
21
+ trust_remote_code : True
22
+ attn_implementation : " flash_attention_2" # The model requires Flash Attention.
23
+
24
+ generation :
25
+ max_new_tokens : 64
26
+ batch_size : 1
27
+
28
+ engine : VLLM
You can’t perform that action at this time.
0 commit comments