Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.06350 (cs)

[Submitted on 12 Jul 2023 (v1), last revised 30 Oct 2023 (this version, v2)]

Title:T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Authors:Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

View PDF

Abstract:Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at this https URL.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.06350 [cs.CV]
	(or arXiv:2307.06350v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.06350

Submission history

From: Kaiyi Huang [view email]
[v1] Wed, 12 Jul 2023 17:59:42 UTC (13,191 KB)
[v2] Mon, 30 Oct 2023 11:42:42 UTC (13,449 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators