VLM Finetuning support #411

nikita-smetanin · 2025-12-22T20:28:32Z

Add support for Multimodal datasets in OpenAI-like format
Add support for Vision-Language model training with optional Vision encoder finetuning

src/together/utils/files.py

connermanuel · 2025-12-23T06:06:19Z

tests/unit/test_finetune_resources.py

certainly dont mind this change but i wonder how it got in

Our black formatter - we don't enforce it for commits but it's configured here

connermanuel · 2025-12-23T06:12:23Z

src/together/cli/api/finetune.py


+    if model_limits.supports_vision:
+        # Don't show price estimation for multimodal models yet
+        confirm = True


sorry, i don't have context here, why does this prevent showing the price estimation?

Its estimation would be very off for multimodal models - due to the way we predict token counts, etc. This is something to address in the future.

ah yes, i mean why does setting the variable confirm = True disable estimation

It avoids calling click.confirm(...), which shows price estimation and waits for the user input few lines below.

src/together/resources/finetune.py

connermanuel · 2025-12-23T06:17:50Z

src/together/utils/files.py

                    line_number=idx + 1,
                    error_source="key_value",
                )
-            if not isinstance(message[column], str):


perhaps you can check isinstance(message[column], MessageContent) instead?

MessageContent contains parameterized types like list[dict...], isinstance doesn't support those

connermanuel · 2025-12-23T06:18:39Z

src/together/utils/files.py

 def _check_message_role(
-    message: Dict[str, str | bool], previous_role: str | None, idx: int
-) -> str | bool:
+    message: Dict[str, str | int | MessageContent], previous_role: str | None, idx: int


when is the message an int?

Enforced by linter - due to original message struct containing int fields as well ("weights"). I didn't want to introduce any ignores here. Same below

ah got it, its for weights - thanks!

src/together/utils/files.py

nikita-smetanin added 7 commits December 22, 2025 18:39

Support Multimodal datasets

b93a673

Support Multimodal datasets

a71eee3

Support Multimodal datasets

0890d35

Support Multimodal datasets

367f606

Support VLM finetuning

b026e4e

Support VLM finetuning

c93a870

Support VLM finetuning

158ae5a

nikita-smetanin requested review from connermanuel and sbassam December 22, 2025 20:28

sbassam approved these changes Dec 23, 2025

View reviewed changes

src/together/utils/files.py Outdated Show resolved Hide resolved

src/together/utils/files.py Show resolved Hide resolved

connermanuel approved these changes Dec 23, 2025

View reviewed changes

nikita-smetanin added 3 commits December 23, 2025 13:42

Support VLM finetuning

9a25145

Support VLM finetuning

cb3f97f

Support VLM finetuning

b7ef947

nikita-smetanin merged commit eba5e5f into main Dec 23, 2025
11 checks passed

nikita-smetanin deleted the nikita/vlm_finetuning_support branch December 23, 2025 14:19

VLM Finetuning support #411

VLM Finetuning support #411

Uh oh!

Conversation

nikita-smetanin commented Dec 22, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants