Skip to content

Finetuning Image Text Vectorizer with CLIP #21

@singularity014

Description

@singularity014

Hello, I tried finetuning Image-Text Vectorizer CLIP model using above approach. But I get stuck with the error -

image

Link to full code - Colab

What I need is something which gives cosine similarity between an image and a text, shall I finetune with triplet, or with cosine similarity? if its cosine similarity, then how will I get those cosine similarity?

The triplet variant takes text and image and gives one normalised vector, I am bit confused because I thought it would give a cosine similarity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions