Robin Hesse1*, DoΔukan BaΔcΔ±1*, Bernt Schiele2, Simone Schaub-Meyer1,3, Stefan Roth1,3
1Technical University of Darmstadt 2Max Planck Institute for Informatics, SIC 3hessian.AI
- Table of contents
- News π°
- Benchmark description π
- Interactive plot π
- How to install and run the project π₯οΈ
βΆοΈ - Model zoo π€π§
- Citation
- 06.04.2025: Project page is online
- 24.03.2025: Paper and code are released.
QUBA (Quality Understanding Beyond Accuracy) is a holistic benchmark designed to evaluate computer vision models across 9 distinct quality dimensions.
- π₯ Accuracy (β): Standard Top-1 performance on the clean ImageNet validation set.
- π‘οΈ Adversarial Robustness (β): Measures the model's resistance to malicious perturbations and attacks
- πͺοΈ Corruption Robustness (β): Evaluates performance on image corruptions (noise, blur, weather etc.) using ImageNet-C.
- π½ OOD Robustness (β): Tests the model's accuracy on Out-Of-Distribution samples.
- βοΈ Calibration Error (β): Quantifies how well the predicted confidence scores align with the actual accuracy
- π― Class Balance (β): Analyzes the uniformity of performance across different classes, measuring consistent behavior across the entire dataset.
- π§ Shape Bias (β): Determines whether the model relies more on object shape (human-like perception) or texture details for classification.
- π Object Focus (β): Measures the model's ability to make decisions based on the foreground and not the background.
- π Parameters (β): The size of the model, enabling analysis of the trade-offs between model scale and behavioral quality.
Explore our data from our experiments with this interactive scatter plot! Easily visualize relationships between quality dimensions and uncover information by hovering and filtering the data points. Dive in and discover new insights!
In the following it is shown how to set up the environment with conda:
#Choose a folder to clone the repository
git clone https://github.com/visinf/beyond-accuracy.git
#Move to the cloned folder
cd beyond-accuracy
#Use the provided environment.yml file to create the conda environment
conda env create -f environment.yml
#activate the environment
conda activate quba
#After setting up the environment and activating it,
#move into the beyond-accuracy folder
cd beyond-accuracy
#Before starting the experiments, please specify the directory in which the datasets are located as well as where the helper directory is located. This must be done in the quba_constanty.py file. Only the constants _DATA_DIR and _PROJ_DIR have to be changed for the datasets.
The quba_constants.py file is located in ./QUBA/helper.
#That folder is used to download the needed datasets.
#The ImageNet Dataset is not included in this function, since you need to apply for it at the official ImageNet Website: (https://image-net.org/download-images).
#The ImageNet-C Dataset is also not included in this function, you can download it at corresponding GitHub Page: (https://github.com/hendrycks/robustness)
#Now you can start the experiments.
#(The following is only an example on how to test a ResNet50 on the metrics
we used for our analysis, please refer to the documentation below for more possibilities)
python evaluate.py --model ResNet50 --params --accuracy --adv_rob --c_rob --ood_rob --object_focus --calibration_error --class_balance --shape_bias --batch_size 32 --file results.xlsx --device cuda:0
#The runtime depends on the device you are using for computation.
#The raw results are stored in the specified excel file, which should
#be in the same folder as the eval.py file
| Argument | Explaination |
|---|---|
| --model | Refers to the model you want to test. Alternatively, instead of testing a single model, you could also models group-wise (e.g. CNN, TRA etc.) or even all at once. The default is ALL |
| --file | Excel file in which the results are printed after each run. The filename should end with .xlsx. The default value is results.xlsx. |
| --device | Specifies the device on which the computations are done. Default is cuda:0. |
| --batch_size | The Batch Size that is used for loading the images. Default is 32. |
| --num_workers | Number of subprocesses that is used for loading the data. Default is 10. |
| --accuracy | Measures the Accuracy. (Optional) |
| --adv_rob | Measures the Adversarial Robustness. (Optional) |
| --c_rob | Measures the Corruption Robustness. (Optional) |
| --ood_rob | Measures the OOD Robustness. (Optional) |
| --object_focus | Measures the Object Focus. (Optional) |
| --calibration_error | Measures the Calibration Error. (Optional) |
| --class_balance | Measures the Class Balance. (Optional) |
| --shape_bias | Measures the Shape Bias. (Optional) |
| --params | Measures the number of parameters. (Optional) |
| --compute_corr | Computes the rank correlation between dimensions. This is only possible when evaluating at least two models and at least two quality dimensions (Optional) |
| --quba_weights | Setting weights for the quba score computation, default is the standard weighting. The weights need to be passed as an array (e.g. [1,2,3,4,5,6,7,8,9]). The custom weights can only be used when all quality dimensions are measured (Optional) |
In the example below, it is shown how you can easily add your own model to the QUBA model zoo.
# Add following lines to the load_model(...) function in /helper/generate_data.py
if type == "Name_of_your_Model":
model= ... #Loading your model
transform = ... #Specify the image transformations for your model
return StandardModel(model=model, model_name=model_name, transform=transform)
# In evaluate.py please add your model to the arguments for parsing
parser.add_argument("--model", required=True,
choices=[
--yourmodel
],
help='...')
#Now you can use your model for the experiments
python evaluate.py --model yourmodel --params --accuracy --adv_rob --shape_bias --batch_size 32 --file results.xlsx --device cuda:0
Our model zoo includes 326 models from computer vision literature. Click below to expand the full list of sources and weights.
π Click to expand the full Model Zoo list
| Source | Models |
|---|---|
torchvision |
AlexNet, GoogLeNet, VGG11, VGG13, VGG16, VGG19, VGG11-bn, VGG13-bn, VGG16-bn, VGG19-bn, ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, WRN-50-2, WRN-101-2, SqueezeNet, InceptionV3, ResNeXt50-32x4d, ResNeXt101-32x8d, ResNeXt101-64x4d, DenseNet121, DenseNet161, DenseNet169, DenseNet201, MobileNetV2, ShuffleNet-v2-05, ShuffleNet-v2-1, ShuffleNet-v2-15, ShuffleNet-v2-2, MobileNetV3-s, MobileNetV3-l, MnasNet-05, MnasNet-075, MnasNet-1, MnasNet-13, EfficientNet-B0, EfficientNet-B1, EfficientNet-B2, EfficientNet-B3, EfficientNet-B4, EfficientNet-B5, EfficientNet-B6, EfficientNet-B7, RegNet-y-400mf, RegNet-y-800mf, RegNet-y-1-6gf, RegNet-y-3-2gf, RegNet-y-8gf, RegNet-y-16gf, RegNet-y-32gf, VIT-b-16, VIT-l-16, VIT-b-32, VIT-l-32, Swin-T, Swin-S, Swin-B, MaxViT-t, SwinV2-T-Win8, SwinV2-S-WIn8, SwinV2-B-Win8, ConvNext-T, ConvNext-S, ConvNext-B, ConvNext-L |
PyTorch-image-models |
InceptionV4, Inception-ResNetv2, Xception, NasNet-l, MobileNetV3-l-21k, NS-EfficientNet-B0, NS-EfficientNet-B1, NS-EfficientNet-B2, NS-EfficientNet-B3, NS-EfficientNet-B4, NS-EfficientNet-B5, NS-EfficientNet-B6, NS-EfficientNet-B7, BiTM-resnetv2-50x1, BiTM-resnetv2-50x3, BiTM-resnetv2-101x1, BiTM-resnetv2-152x2, EfficientNet-v2-S, EfficientNet-v2-S-21k, EfficientNet-v2-M, EfficientNet-v2-M-21k, EfficientNet-v2-L, EfficientNet-v2-L-21k, DeiT-t, DeiT-s, DeiT-b, ConViT-t, ConViT-s, ConViT-b, CaiT-xxs24, CaiT-xs24, CaiT-s24, CrossViT-9dagger, CrossViT-15dagger, CrossViT-18dagger, XCiT-s24-16, XCiT-m24-16, XCiT-l24-16, LeViT-128, LeViT-256, LeViT-384, PiT-t, PiT-xs, PiT-s, PiT-b, CoaT-t-lite, CoaT-mi-lite, CoaT-s-lite, CoaT-me-lite, MaxViT-b, MaxViT-l, DeiT3-s, DeiT3-s-21k, DeiT3-m, DeiT3-m-21k, DeiT3-b, DeiT3-b-21k, DeiT3-l, DeiT3-l-21k, MViTv2-t, MViTv2-s, MViTv2-b, MViTv2-l, SwinV2-t-W16, SwinV2-s-Win16, SwinV2-b-Win16, SwinV2-b-Win12to16-21k, SwinV2-l-Win12to16-21k, ViT-t5-16, ViT-t5-16-21k, ViT-t11-16, ViT-t11-16-21k, ViT-t21-16, ViT-t21-16-21k, ViT-s-16, ViT-s-16-21k, ViT-b-16-21k, ViT-b-32-21k, ViT-l-16-21k, ViT-l-32-21k, ConvNext-T-21k, ConvNext-S-21k, ConvNext-B-21k, ConvNext-L-21k, BeiT-b, EfficientFormer-l1, EfficientFormer-l3, EfficientFormer-l7, DaViT-t, DaViT-s, DaViT-b, ConvNextV2-N, ConvNextV2-N-21k, ConvNextV2-T, ConvNextV2-T-21k, ConvNextV2-B, ConvNextV2-B-21k, ConvNextV2-L, ConvNextV2-L-21k, EVA02-t-21k, EVA02-s-21k, EVA02-b-21k, InceptionNext-t, InceptionNext-s, InceptionNext-b, FastViT-sa12, FastViT-sa24, FastViT-sa36, SeNet154, ResNet50d, ResNeXt50-32x4d-YFCCM100, ResNet50-yfcc100m, ResNet50-ig1B, ResNeXt101-32x8d-IG1B, ResNeXt50-32x4d-IG1B, ResNet18-IG1B, vit-t-16-21k, EfficientNet-b0-A1, EfficientNet-b1-A1, EfficientNet-b2-A1, EfficientNet-b3-A1, EfficientNet-b4-A1, EfficientNetv2-M-A1, EfficientNetv2-S-A1, RegNety-040-A1, RegNety-080-A1, RegNety-160-A1, RegNety-320-A1, ResNet101-A1, ResNet152-A1, ResNet18-A1, ResNet34-A1, ResNet50-A1, ResNet50d-A1, ResNext50-32x4d-A1, SeNet154-A1, EfficientNet-b0-A2, EfficientNet-b1-A2, EfficientNet-b2-A2, EfficientNet-b3-A2, EfficientNet-b4-A2, EfficientNetv2-M-A2, EfficientNetv2-S-A2, RegNety-040-A2, RegNety-080-A2, RegNety-160-A2, RegNety-320-A2, ResNet101-A2, ResNet152-A2, ResNet18-A2, ResNet34-A2, ResNet50-A2, ResNet50d-A2, ResNext50-32x4d-A2, SeNet154-A2, EfficientNet-b0-A3, EfficientNet-b1-A3, EfficientNet-b2-A3, EfficientNet-b3-A3, EfficientNet-b4-A3, EfficientNetv2-M-A3, EfficientNetv2-S-A3, RegNety-040-A3, RegNety-080-A3, RegNety-160-A3, RegNety-320-A3, ResNet101-A3, ResNet152-A3, ResNet18-A3, ResNet34-A3, ResNet50-A3, ResNet50d-A3, ResNext50-32x4d-A3, SeNet154-A3, RegNet-y-4gf |
wielandbrendel |
BagNet9, BagNet17, BagNet33 |
RobustBench |
Salman2020Do-RN50-2, Salman2020Do-RN50, Liu2023Comprehensive-Swin-B, Liu2023Comprehensive-Swin-L, Liu2023Comprehensive-ConvNeXt-B, Liu2023Comprehensive-ConvNeXt-L, Singh2023Revisiting-ConvNeXt-T-ConvStem, Singh2023Revisiting-ConvNeXt-S-ConvStem, Singh2023Revisiting-ConvNeXt-B-ConvStem, Singh2023Revisiting-ConvNeXt-L-ConvStem, Singh2023Revisiting-ViT-B-ConvStem, Singh2023Revisiting-ViT-S-ConvStem |
Hiera |
Hiera-T, Hiera-S, Hiera-B, Hiera-B-Plus, Hiera-L, |
Microsoft |
BeiTV2-b |
FacebookResearchMAE, DINO, DINOv2 |
vit-b-16-mae-ft, ViT-b-16-DINO-LP, ResNet50-DINO-LP, ViT-s-16-DINO-LP, ViT-l-14-dinoV2-LP, ViT-b-14-dinoV2, ViT-s-14-dinoV2-LP, ViT-l-14-dinov2-reg-LP, ViT-b-14-dinov2-reg-LP, ViT-s-14-dinov2-reg-LP |
HuggingFace |
siglip-b-16, siglip-l-16, CLIP-B16-DataCompXL, CLIP-B16-Laion2B, CLIP-B16-CommonPool-XL-DFN2B, CLIP-L14-OpenAI, CLIP-L14-DataCompXL, CLIP-L14-Laion2B, CLIP-L14-CommonPool-XL-DFN2B, ViT-B-16-SigLIP2, ViT-L-16-SigLIP2-256, CLIP-B16-V-OpenAI, CLIP-B16-V-Laion2B, CLIP-B32-V-OpenAI, CLIP-B32-V-Laion2B |
OpenAI |
clip-resnet50, clip-vit-b-16, clip-resnet101, clip-vit-b-32 |
Apple |
mobileclip-s0, mobileclip-s1, mobileclip-s2, mobileclip-b, mobileclip-blt |
moboehle |
bcos-convnext-base, bcos-convnext-tiny, bcos-DenseNet121, bcos-DenseNet161, bcos-DenseNet169, bcos-DenseNet201, bcos-ResNet152, bcos-ResNet18, bcos-ResNet34, bcos-ResNet50, bcos-simple-vit-b-patch16-224, bcos-ResNet101 |
OpenCLIP |
metaclip-b16, convnext-large-d-clip, metaclip-l14, convnext-base-w-320-clip, convnext-large-d-320-clip |
Trained by us. Checkpoints will be published upon the acceptance of the paper |
Hiera-B-LP, Hiera-S-LP, Hiera-T-LP, ViTB-DINO-FT, ResNet50-DINO-FT, vit-b-16-mae-lp, ViT-l-14-dinoV2-FT, ViT-b-14-dinoV2-FT, ViT-s-14-dinoV2-FT, ViT-l-14-dinoV2-FT-Reg, ViT-b-14-dinoV2-FT-Reg, ViT-s-14-dinoV2-FT-Reg |
You can list all available models and model groups by calling the list_models() function in /helper/generate_data.py
If you find this project useful, please consider citing:
@article{Hesse:2025:beyond_accuracy,
title={Beyond Accuracy: What Matters in Designing Well-Behaved Models?},
author={Robin Hesse and Do\u{g}ukan Ba\u{g}c\i and Bernt Schiele and Simone Schaub-Meyer and Stefan Roth},
year={2025},
journal={arXiv:2503.17110 [cs.CV]},
}
