Add support for multimodal input

Thanks a lot for making this package, I am keen to switching to this for a production app.

Many of the language model providers that AnyLanguageModel already supports have strong multimodal capabilities:

MLX: Supports vision-language models like Qwen2-VL, and other multimodal models
OpenAI: GPT-4o, GPT-4o-mini, and GPT-4-turbo all support vision

Currently, users cannot leverage these vision capabilities through AnyLanguageModel, which limits the library's usefulness for multimodal applications.

Use Cases:

- Image captioning and description
- OCR and document understanding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for multimodal input #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add support for multimodal input #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions