Skip to content

TIGER-AI-Lab/ImagenWorld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ–ΌοΈ ImagenWorld

Preprint

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

ImagenWorld is a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios.

  • Broad coverage across 6 domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots.
  • Rich supervision: ~3.6K condition sets and ~20K fine-grained human annotations enable comprehensive, reproducible evaluation.
  • Explainable evaluation pipeline: We decompose generated outputs via object/segment extraction to identify entities (objects, fine-grained regions), supporting both scalar ratings and object-/segment-level failure tags.
  • Diverse model suite: We evaluate 14 models in total β€” 4 unified (GPT-Image-1, Gemini 2.0 Flash, BAGEL, OmniGen2) and 10 task-specific baselines (SDXL, Flux.1-Krea-dev, Flux.1-Kontext-dev, Qwen-Image, Infinity, Janus Pro, UNO, Step1X-Edit, IC-Edit, InstructPix2Pix).

πŸ“– Introduction

This repository contains the code for the paper ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks. In this paper, We introduce ImagenWorld, a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios. Unlike prior evaluations that focus on isolated tasks or narrow domains, ImagenWorld is organized into six domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots, and six tasks: Text-to-Image Generation (TIG), Single-Reference Image Generation (SRIG), Multi-Reference Image Generation (MRIG), Text-to-Image Editing (TIE), Single-Reference Image Editing (SRIE), and Multi-Reference Image Editing (MRIE). The benchmark includes 3.6K condition sets and 20K fine-grained human annotations, providing a comprehensive testbed for generative models. To support explainable evaluation, ImagenWorld applies object- and segment-level extraction to generated outputs, identifying entities such as objects and fine-grained regions. This structured decomposition enables human annotators to provide not only scalar ratings but also detailed tags of object-level and segment-level failures.

Teaser

πŸš€ Release Plan

We will release the evaluation scripts and annotated masks.

Stay tuned for updates!

Citation

If you find our work useful for your research, please consider citing our paper:

@article{
}

About

Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •