Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits

We are using the visual part (ViT) of BioClip to process images. However, there is an issue with the forward method in BaseCAM.
In the following line of code:
`self.outputs = outputs = self.activations_and_grads(input_tensor)
`
`target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
`
The outputs in this case is the CLS token embedding, which is a high-dimensional vector used to represent the global semantic information of the input image. This embedding is not a classification result or logits, but rather a feature vector.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions