Skip to content

visinf/emat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation

Dustin Carrión-Ojeda1,2     Stefan Roth1,2     Simone Schaub-Meyer1,2

1TU Darmstadt     2hessian.AI

Accepted to GCPR 2025

Paper Project Page

License Framework Framework

TL;DR: EMAT processes high-resolution correlation tokens, boosting few-shot classification and segmentation, especially for small objects, while using at least four times fewer parameters than existing methods. It supports N-way K-shot tasks and correctly outputs empty masks when no target is present.

Installation

This project was originally developed using Python 3.9, PyTorch 2.1.1, and CUDA 12.1 on Linux. To reproduce the environment, follow these steps:

# 1) Clone the repository
git clone https://github.com/visinf/emat

# 2) Move into the repository
cd emat

# 3) Create the conda environment
conda create -n emat -c conda-forge -c nvidia -c pytorch \
    python=3.9 cuda-version=12.1 \
    pytorch==2.1.1 torchvision==0.16.1 pytorch-cuda=12.1

# 4) Activate the conda environment
conda activate emat

# 5) Install additional required packages using pip
pip install -r requirements.txt

Dataset Preparation

This project uses the PASCAL-5i and COCO-20i datasets. After downloading both datasets, organize them in the following directory structure:

<DIR_WITH_DATASETS>/
    PASCAL/
        JPEGImages/
        SegmentationClassAug/
        ...
    COCO/
        annotations/
        train2014/
        val2014/
        ...

Training

EMAT can be trained for Few-Shot Classification and Segmentation (FS-CS) or Few-Shot Segmentation (FS-S). Below, we describe how to execute each type of few-shot task. We performed training on three NVIDIA RTX A6000 GPUs (48 GB). You can train EMAT using fewer GPUs by adjusting the batch size in the configuration file located in configs/.

Prerequisites

  1. The PASCAL-5i and COCO-20i datasets must be organized as described in the Dataset Preparation section. Additionally, set the path <DIR_WITH_DATASETS> in the configuration file under DATA.PATH.
  2. EMAT uses a ViT-S/14 (without registers) pre-trained with DINOv2 as its backbone. Download the corresponding checkpoint and set its path in the configuration file under METHOD.BACKBONE_CHECKPOINT.

Few-Shot Classification and Segmentation (FS-CS)

Each dataset includes four folds. For example, to train EMAT on Fold-0 of PASCAL-5i, run:

python main.py \
       --config_path configs/emat-pascal.yaml \
       --fold 0 \
       --way 1 \
       --shot 1 \
       --gpus 0,1,2

Few-Shot Segmentation (FS-S)

Similarly, to train EMAT on Fold-0 of PASCAL-5i for segmentation only, run:

python main.py \
       --config_path configs/ematseg-pascal.yaml \
       --fold 0 \
       --way 1 \
       --shot 1 \
       --only_seg \
       --no_empty_masks \
       --gpus 0,1,2

Checkpoints

We demonstrate that EMAT outperforms the recent state of the art (CST) across different evaluation settings. For a fair comparison, we updated CST* to use the same backbone as EMAT (i.e., DINOv2 instead of DINO). The final checkpoints are provided in Table 1.

Table 1. Comparison of EMAT and the previous SOTA (CST*) in FS-CS on PASCAL-5i and COCO-20i across all evaluation settings: original, partially augmented, and fully augmented, using 2-way 1-shot tasks (base configuration).

Dataset Method Checkpoint Original Partially Augmented Fully Augmented
Acc. mIoU Acc. mIoU Acc. mIoU
PASCAL-5i CST* download 80.58 63.28 80.60 63.23 78.57 63.08
EMAT download 82.70 63.38 82.92 63.32 81.23 63.24
COCO-20i CST* download 78.70 51.47 78.87 51.53 71.18 50.76
EMAT download 80.07 52.81 80.25 52.82 73.00 51.99

Additional checkpoints can be found here.

Evaluation

As explained earlier, EMAT can be used for both Few-Shot Classification and Segmentation (FS-CS) and Few-Shot Segmentation (FS-S). To evaluate a checkpoint, either one obtained after training or one of our provided checkpoints, for FS-CS, run:

python main.py \
       --experiment_path <PATH_TO_EXPERIMENT> \
       --fold {0, 1, 2, 3} \
       --way 2 \
       --shot 1 \
       --setting {original, partially-augmented, fully-augmented} \
       --eval

To evaluate a checkpoint on our splits based on object size, run:

python main.py \
       --experiment_path <PATH_TO_EXPERIMENT> \
       --fold {0, 1, 2, 3} \
       --way 1 \
       --shot 1 \
       --object_size_split {0-5, 5-10, 10-15, 0-15} \
       --eval \
       --no_empty_masks

Finally, to evaluate a checkpoint on FS-S, use:

python main.py \
       --experiment_path <PATH_TO_EXPERIMENT> \
       --fold {0, 1, 2, 3} \
       --way 1 \
       --shot 1 \
       --eval \
       --only_seg \
       --no_empty_masks

Evaluation Scripts

We also provide evaluation scripts in the evaluation/ directory to reproduce the results presented in our paper.

Table 2. Available evaluation scripts.

Script Name Description
eval_emat.sh Evaluates EMAT on all folds of PASCAL-5i and COCO-20i across all evaluation settings and object size splits (FS-CS).
eval_ematseg.sh Evaluates EMAT on all folds of PASCAL-5i and COCO-20i using 1-way 1- and 5-shot tasks (FS-S).
eval_cst.sh Evaluates CST* on all folds of PASCAL-5i and COCO-20i across all evaluation settings and object size splits (FS-CS).
eval_cst-large.sh Evaluates CST* with a larger support dimension on all folds of PASCAL-5i, using both the full dataset and only small-object subsets (FS-CS).

To reproduce all results:

  1. Download all checkpoints provided here, and place them in the experiments/ directory.
  2. Run each script as follows:
cd evaluation
bash <SCRIPT_NAME> <GPU_ID>
  1. After executing all scripts, process the results using the following command:
python process_results.py

Citation

If you find our work helpful, please consider citing the following paper and ⭐ the repo.

@inproceedings{carrion2025emat,
    title={Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation},
    author={Dustin Carrión-Ojeda and Stefan Roth and Simone Scahub-Meyer},
    booktitle={Proceedings of the German Conference on Pattern Recognition (GCPR)},
    year={2025},
}

Acknowledgements

We acknowledge the authors of CST, and DINOv2 for open-sourcing their implementations. This project was funded by the Hessian Ministry of Science and Research, Arts and Culture (HMWK) through the project "The Third Wave of Artificial Intelligence - 3AI". The project was further supported by the Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) under Germany's Excellence Strategy (EXC 3057/1 "Reasonable Artificial Intelligence", Project No. 533677015). Stefan Roth acknowledges support by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 866008).

About

Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation (GCPR 2025)

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •