Finetuning Image Text Vectorizer with CLIP


Hello, I tried finetuning Image-Text Vectorizer CLIP model using above approach. But I get stuck with the error - 

![image](https://user-images.githubusercontent.com/49594316/136137847-09058ca8-e2e6-4cc1-b127-2bdaa0d33f66.png)

Link to full code - [Colab](https://colab.research.google.com/drive/1_4t67egj76v7Bf2-ayLAhNZKtuOXgtxW#scrollTo=sDMKXZnADQet)

What I need is something which gives cosine similarity between an image and a text, shall I finetune with triplet, or with cosine similarity? if its cosine similarity, then how will I get those cosine similarity?

The triplet variant takes text and image and gives one normalised vector, I am bit confused because I thought it would give a cosine similarity.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetuning Image Text Vectorizer with CLIP #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetuning Image Text Vectorizer with CLIP #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions