Skip to content

Conversation

@nikita-smetanin
Copy link
Contributor

  • Add support for Multimodal datasets in OpenAI-like format
  • Add support for Vision-Language model training with optional Vision encoder finetuning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly dont mind this change but i wonder how it got in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our black formatter - we don't enforce it for commits but it's configured here


if model_limits.supports_vision:
# Don't show price estimation for multimodal models yet
confirm = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i don't have context here, why does this prevent showing the price estimation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its estimation would be very off for multimodal models - due to the way we predict token counts, etc. This is something to address in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, i mean why does setting the variable confirm = True disable estimation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids calling click.confirm(...), which shows price estimation and waits for the user input few lines below.

line_number=idx + 1,
error_source="key_value",
)
if not isinstance(message[column], str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps you can check isinstance(message[column], MessageContent) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MessageContent contains parameterized types like list[dict...], isinstance doesn't support those

def _check_message_role(
message: Dict[str, str | bool], previous_role: str | None, idx: int
) -> str | bool:
message: Dict[str, str | int | MessageContent], previous_role: str | None, idx: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is the message an int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforced by linter - due to original message struct containing int fields as well ("weights"). I didn't want to introduce any ignores here. Same below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it, its for weights - thanks!

@nikita-smetanin nikita-smetanin merged commit eba5e5f into main Dec 23, 2025
11 checks passed
@nikita-smetanin nikita-smetanin deleted the nikita/vlm_finetuning_support branch December 23, 2025 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants