-
Notifications
You must be signed in to change notification settings - Fork 21
VLM Finetuning support #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nikita-smetanin
commented
Dec 22, 2025
- Add support for Multimodal datasets in OpenAI-like format
- Add support for Vision-Language model training with optional Vision encoder finetuning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly dont mind this change but i wonder how it got in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our black formatter - we don't enforce it for commits but it's configured here
|
|
||
| if model_limits.supports_vision: | ||
| # Don't show price estimation for multimodal models yet | ||
| confirm = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i don't have context here, why does this prevent showing the price estimation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its estimation would be very off for multimodal models - due to the way we predict token counts, etc. This is something to address in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, i mean why does setting the variable confirm = True disable estimation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It avoids calling click.confirm(...), which shows price estimation and waits for the user input few lines below.
| line_number=idx + 1, | ||
| error_source="key_value", | ||
| ) | ||
| if not isinstance(message[column], str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps you can check isinstance(message[column], MessageContent) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MessageContent contains parameterized types like list[dict...], isinstance doesn't support those
| def _check_message_role( | ||
| message: Dict[str, str | bool], previous_role: str | None, idx: int | ||
| ) -> str | bool: | ||
| message: Dict[str, str | int | MessageContent], previous_role: str | None, idx: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is the message an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforced by linter - due to original message struct containing int fields as well ("weights"). I didn't want to introduce any ignores here. Same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, its for weights - thanks!