🖼️ ImagenWorld

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

ImagenWorld is a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios.

Broad coverage across 6 domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots.
Rich supervision: ~3.6K condition sets and ~20K fine-grained human annotations enable comprehensive, reproducible evaluation.
Explainable evaluation pipeline: We decompose generated outputs via object/segment extraction to identify entities (objects, fine-grained regions), supporting both scalar ratings and object-/segment-level failure tags.
Diverse model suite: We evaluate 14 models in total — 4 unified (GPT-Image-1, Gemini 2.0 Flash, BAGEL, OmniGen2) and 10 task-specific baselines (SDXL, Flux.1-Krea-dev, Flux.1-Kontext-dev, Qwen-Image, Infinity, Janus Pro, UNO, Step1X-Edit, IC-Edit, InstructPix2Pix).

[🌐 Project Page] [📄 Preprint] [💾 Datasets] [🏛️ ImagenWorld-Visualizer]

📖 Introduction

This repository contains the code for the paper ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks. In this paper, We introduce ImagenWorld, a large-scale, human-centric benchmark designed to stress-test image generation models in real-world scenarios. Unlike prior evaluations that focus on isolated tasks or narrow domains, ImagenWorld is organized into six domains: Artworks, Photorealistic Images, Information Graphics, Textual Graphics, Computer Graphics, and Screenshots, and six tasks: Text-to-Image Generation (TIG), Single-Reference Image Generation (SRIG), Multi-Reference Image Generation (MRIG), Text-to-Image Editing (TIE), Single-Reference Image Editing (SRIE), and Multi-Reference Image Editing (MRIE). The benchmark includes 3.6K condition sets and 20K fine-grained human annotations, providing a comprehensive testbed for generative models. To support explainable evaluation, ImagenWorld applies object- and segment-level extraction to generated outputs, identifying entities such as objects and fine-grained regions. This structured decomposition enables human annotators to provide not only scalar ratings but also detailed tags of object-level and segment-level failures.

🚀 Release Plan

We will release the evaluation scripts and annotated masks.

Stay tuned for updates!

Citation

If you find our work useful for your research, please consider citing our paper:

@article{
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🖼️ ImagenWorld

📖 Introduction

🚀 Release Plan

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

TIGER-AI-Lab/ImagenWorld

Folders and files

Latest commit

History

Repository files navigation

🖼️ ImagenWorld

📖 Introduction

🚀 Release Plan

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages